Pollinator Potential [NO_POPO_001]

Authors
Affiliation

Kwaku Peprah Adjei

Norwegian institute for Nature Research

Anders Kolstad

Norwegian institute for Nature Research

Markus A.K. Sydenham

Norwegian institute for Nature Research

Published

October 23, 2024

Status

This indicator documentation is incomplete and needs further developement before indicator values can be calculated.

Recomended citation

Adjei, K. P, Kolstad, A. , Sydenham, M. 2024. Pollinator potential (ID: NO_POPO_001) v. 000.001. ecRxiv: https://github.com/NINAnor/ecRxiv/tree/main/indicators/NO_POPO

Version

000.001

Show metadata
Table 1: Indicator metadata
Indicator ID NO_POPO_001
Indicator Name Pollinator potential
Country Norway
Continent Europe
Ecosystem Condition Typology Class B1 - Compositional State Characteristics
Realm Terrestrial (T)
Biome T4 Savannas and grasslands biome
Ecosystem T4.5 Temperate grasslands
Year added 2024
Last update 2024
status incomplete
Version 000.001
Version comment First draft
url https://github.com/NINAnor/ecRxiv/tree/main/indicators/NO_POPO

Pollinator potential


In review

This indicator is in review and is subject to change




1. Introduction

We aim to model the ecosystem conditions in Norway given their known interactions with some plant species. We develop pollinator indicators which are used to describe ecosystem conditions by focusing on the ASO data meadows, which represent the semi-natural habitats we are interested in describing. To achieve this objective, we do the following:

    1. obtain data from the Global Biodiversity Infrastructure Facility (GBIF ) for bumblebees, butterflies and bees (which we will refer to as pollinators in this study) and plant hosts (only the plant species we have interaction data of). We also obtain covariates that will be needed for the model fit Table 8.
    1. fit an integrated species distribution model (ISDM) to the data and predict the intensities of each pollinator (average abundance of each pollinator within a grid cell) of across the alpine, boreal and temperate zones [REF to the table with zones]. Cluster all the pollinator species (over \(8000\) of them) either as alpine, boreal or temperate pollinators and assign each set of species the predicted intensities estimated. Also, fit an ISDM to the plant host data and predict their occurrence probability across the study region.
    1. obtain a web plot interaction matrix of the pollinator and plant hosts using data described in Section 1.9.4.
    1. using the intensities of the pollinator species and plant species occurrence probabilities (see (a)), together with the interaction matrix (see (c)), estimate the alpha diversity index (specifically species richness, Shannon and Simpson indices) of the pollinators across Norway.
    1. extract the diversity indices, latitude and annual temperature value for the ASO meadow locations. Cluster the diversity indices based on the geo-climatic region (defined by the latitude and annual temperature; see Table 4), and estimate the maximum diversity for each geo-climatic region.
    1. scale the rest of the diversity indices by this indicator value to obtain nominal indicators for the study region. We then describe the ecosystem conditions using a threshold value of our choice.

We summarise these steps in Figure 1.

Figure 1: Modelling framework used to describe the pollinator potential in this project. The modelling start with obtaining A) bees, bumblebees and hoverflies data from GBIF, B) plant host data from GBIF and C) climatic and trait covariates described in Table 8. The pollinator and plant host data used were both presence only (with locations indicated with black dots) and presence-absence (with presence indicated by the correct sign and absence indicated by the x sign). Using the data at hand, we fit integrated species distribution models (ISDMs) to predict the D) pollinator intensities and E) plant host occurrence probabilities. We then construct F) interraction between the plant and pollinators using the data described in Section 1.9.4 and G) estimate the alpha diversity indices. We extract the diversity indices at the ASO meadows and use these values to calculate te indicators H) ASO meadows and I)entire study region. The deep blue color indicates region with high diversity/indicators and light blue color indicate region with low diversity/indicators.

2. About the underlying data

The pollinator (GBIF.Org User 2024a) and plant species (GBIF.Org User 2024b) data used to fit the ISDMs were obtained from GBIF. The data was downloaded via the R-package rgbif (Chamberlain et al. 2024), and its associated metadata was extracted using the same R-package. The downloaded GBIF data contains \(26\) datasets and \(30\) datasets with records on pollinators and plants respectively. We present a summary of the datasets downloaded for the pollinator and plant species respectively in Table 6 and Table 7 respectively.

From the metadata, we ascertain the type of each dataset: either presence-only, presence-absence or count data. We merge all the presence-only data into one dataset (which we call mergedDatasetPO), but we do not merge the rest of the datasets. This is because the presence-absence data (processed from sampling event protocols) have different sampling protocols, and we would like to preserve their unique attributes. A summary of the formatted dataset used to fit the ISDMs are presented in Table 2 and Table 3.

Code
######################################################
# POLLINATOR DATA FORMAT
#####################################################

pollinatorFolder <- paste0(dataFolder,"/pollinatorDataFolder")
taxonKey <- c(811, 797, 1457)
formattedData <- list()

if(!file.exists(paste0(pollinatorFolder, "/formatData/formattedPollinatorData.RDS"))){
  
  #for(i in seq_along(taxonKey)){
 occurrences <- lapply(seq_along(taxonKey), function(i){
  
  # select the taxaKey for a particular pollinator
  taxaIndex <- taxonKey[i]
  
  #if(!file.exists(paste0(pollinatorFolder, "/", taxaIndex,"formattedData.RDS"))){
    
    if(!file.exists(paste0(pollinatorFolder, "/", taxaIndex,"occurrences.RDS"))){
      
      # Get download key 
      if (file.exists(paste0(pollinatorFolder, "/",taxaIndex, "DownloadKey.RDS"))) {
        downloadKey <- readRDS(paste0(pollinatorFolder, "/",taxaIndex, "DownloadKey.RDS"))
      } else {
        downloadKey <- getDownloadKey(taxaIndex, regionGeometry)
        if(waitForGbif){
          message("Download key has been created and will download once it is ready (5-30 minutes). ",
                  "View the download status at https://www.gbif.org/occurrence/download/", 
                  downloadKey)
          downloadKey <- occ_download_wait(downloadKey, curlopts = list(), quiet = FALSE)
          base::attr(downloadKey,'doi') <- downloadKey$doi
          saveRDS(downloadKey, file = paste0(pollinatorFolder, "/",taxaIndex, "DownloadKey.RDS"))
        } else {
          downloadKey <- occ_download_meta(downloadKey) 
          saveRDS(downloadKey, file = paste0(pollinatorFolder, "/",taxaIndex, "DownloadKey.RDS"))
          stop(paste0("Download key has been created and your download is being prepared. View the download at https://www.gbif.org/occurrence/download/",
                      downloadKey$key, ". Come back and start the download in 5-30 minutes."))
        }
      } 
      
      
      # Start GBIF Download  
      
      #### FORMAT SCHEDULED DOWNLOAD ####
      
      # This script takes your download key and downloads and produces a data frame containing all your 
      # relevant species occurrences. It also adds a few important columns.
      
      # Download and unzip the file from GBIF
      
      downloadGet <- occ_download_get(key=downloadKey$key, path=paste0(pollinatorFolder), overwrite=TRUE)
      occurrences <- occ_download_import(downloadGet)
      saveRDS(occurrences, 
              paste0(pollinatorFolder, "/", taxaIndex,"occurrences.RDS"))
    } else {
      occurrences <- readRDS(paste0(pollinatorFolder, "/", taxaIndex,"occurrences.RDS"))
    }
    # Create dataset types
    # using the default creation from the hotspot project
    # Import metadata information
    if(file.exists(paste0(dataFolder, "/metadataSummary.csv"))){
      dataTypes <- read.csv(paste0(dataFolder, "/metadataSummary.csv"))
    } 
    
    # Reduce to relevant columns immediately to save space
    occurrences <- occurrences[,c("acceptedScientificName", "decimalLongitude", "decimalLatitude", "basisOfRecord",
                                  "year", "month", "datasetKey", "datasetName", "taxonRank", "kingdomKey", "phylumKey", "classKey", "orderKey",
                                  "familyKey", "genusKey", "speciesKey", "issue", "genus", "order", "family", "occurrenceID")]
    
    # Remove any occurrences with certain issues attached to them
    issuesToFlag <- c("ZERO_COORDINATE|COORDINATE_OUT_OF_RANGE|COORDINATE_INVALID|COORDINATE_PRECISION_INVALID|COORDINATE_UNCERTAINTY_METRES_INVALID")
    occurrences <- occurrences %>%
      filter(datasetKey %in% dataTypes$datasetKey[!is.na(dataTypes$processing)]) %>%
      filter(!grepl(issuesToFlag,issue))
    
    # Assign a taxon key based on what level of taxonomy the key is valid for
    occurrences$taxonKeyProject <- ifelse(occurrences$speciesKey %in% taxonKey, occurrences$speciesKey,
                                          ifelse(occurrences$genusKey %in% taxonKey, occurrences$genusKey,
                                                 ifelse(occurrences$familyKey %in% taxonKey, occurrences$familyKey,
                                                        ifelse(occurrences$orderKey %in% taxonKey, occurrences$orderKey,
                                                               ifelse(occurrences$classKey %in% taxonKey, occurrences$classKey,
                                                                      ifelse(occurrences$phylumKey %in% taxonKey, occurrences$phylumKey,
                                                                             ifelse(occurrences$kingdomKey %in% taxonKey, occurrences$kingdomKey, NA)))))))
    
    # Add taxa and reduce to relevant columns
    occurrences$taxa <- taxaIndex #focalTaxon$taxa[match(occurrences$taxonKeyProject, focalTaxon$key)]
    
    occurrences <- occurrences[,c("acceptedScientificName", "decimalLongitude", "decimalLatitude", "basisOfRecord",
                                  "year", "month", "datasetKey", "datasetName", "taxa", "taxonKeyProject", "taxonRank", "genus", "order", "family", "occurrenceID")] %>%
      filter(!is.na(taxa))
    # }
    
  })
 
    # Put all groups together
     occurrences <- occurrences %>%
       do.call("rbind", .)%>%
       dplyr::filter(order %in% c("Diptera", "Lepidoptera", "Hymenoptera"))%>%
       dplyr::filter(year > 2009)%>%
       dplyr::mutate(Taxon = ifelse(order == "Lepidoptera", "Butterflies",
                                                           ifelse(order == "Hymenoptera", "Bees",
                                                                  "Hoverflies")))
    ###------------------------------###
    ### 3. Attach relevant metadata ####
    ###------------------------------###
    
    # Now we import metadata related to GBIF data
    metadataList <- metadataPrep(occurrences, 
                                 metaSummary = TRUE)
    
    # Import dataset type based on dataset name. If no dataset information is provided, all data will be downloaded and assumed to
    # be presence only data
    GBIFImportCompiled <- merge(occurrences, 
                                metadataList$metadata, 
                                all.x=TRUE, 
                                by = "datasetKey")
    
    # Narrow down to known data types and split into data frames
    projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
    GBIFLists <- lapply(unique(GBIFImportCompiled$name), FUN  = function(x) {
      GBIFItem <- GBIFImportCompiled[GBIFImportCompiled$name == x,]
      GBIFItem <- st_as_sf(GBIFItem,                         
                           coords = c("decimalLongitude", "decimalLatitude"),
                           crs = projcrs)%>%
        st_transform(., newCrs)
      GBIFcropped <- st_intersection(GBIFItem, regionGeometry)
      GBIFcropped
    })
    
    names(GBIFLists) <- unique(GBIFImportCompiled$name)
    
    # Put the dataLists together and save them
    dataList <- list(species = GBIFLists, 
                     metadata = metadataList, 
                     projcrs = newCrs)
    base::attr(dataList, "level") <- base::attr(regionGeometry, "level")
    base::attr(dataList, "region") <- base::attr(regionGeometry, "region")
    saveRDS(dataList, 
            paste0(pollinatorFolder, "/speciesDataImported.RDS"))
    
    ##########################
    # Process datasets
    ##########################
    speciesData <- dataList[["species"]]
    metadata <- dataList$metadata$metadata
    rm("dataList")
    
    # start processing each dataset
    processedData <- list()
    namesProcessedData <- c()
    for (ds in seq_along(speciesData)) {
      print(ds)
      focalData <- speciesData[[ds]]
      
      # If the dataset is empty, skip it
      if (nrow(focalData) == 0) next
      
      dataType <- unique(focalData$type)
      #if(is.null(dataType)) dataType <- "OCCURRENCE"
      datasetName <- names(speciesData)[ds]
      newDataset <- NULL
      
      cat("Currently processing dataset '", datasetName,"' \n", sep = "")
      
      if (dataType == "SAMPLING_EVENT") {
        focalEndpoint <- metadata$DWCEndpoint[metadata$name == datasetName]
        newDataset <- tryCatch(processFieldNotesEventGenus(focalEndpoint, pollinatorFolder , datasetName, regionGeometry, taxonKey, newCrs, projcrs, focalData),
                               error = function(e) {NULL})
        
        # No need to do anything to presence only data (yet) except add individualCount column
      } else if (dataType == "OCCURRENCE") {
        
        #ensure the coordinates are distrinct
       # coordsgbif.no <- datagbif.no[,c("lon","lat")]
       
        focalData <- focalData %>%
          dplyr::group_by(Taxon) %>%
          dplyr::distinct(geometry)
        
        focalData$dataType <- "PO"
        newDataset <- focalData[,c("acceptedScientificName", "geometry", "dataType", "taxa", "year", "taxonKeyProject", "order")]
      } else if (dataType == "insectMonitoring") {
        focalEndpoint <- metadata$DWCEndpoint[metadata$name == datasetName]
        newDataset <- processNationalInsectMonitoring(focalData, focalEndpoint, pollinatorFolder, regionGeometry)
      }
      
      if (is.null(newDataset)) {next}
      
      
      # convert year to numeric
      newDataset$year <- as.numeric(newDataset$year)
      
      # Save and name new dataset
      processedData[[ds]] <- newDataset
      namesProcessedData[ds] <- datasetName
    }
    
    names(processedData) <- namesProcessedData
    
    #extract processed Data without NA names
    processedData <- processedData[namesProcessedData[!is.na(namesProcessedData)]]
    # Save for use in model construction
    processedData <- processedData[lapply(processedData,nrow)>0]
    saveRDS(processedData, paste0(pollinatorFolder, "/speciesDataProcessed.RDS"))

    #Check if all the datasets are of the same dataType
    mergedDataWithType <- lapply(processedData, function(x){
      unique(x$dataType)
    })%>%
      do.call("rbind", .)%>%
      as.data.frame(.)%>%
      mutate(datasetName = rownames(.))%>%
      rename(dataType = V1)
    
    uniqueDataType <- unique(mergedDataWithType$dataType)
    
    mergedDatasets <- list()
    
    # Merge all presence-only data
    
      dataToMerge <- mergedDataWithType[mergedDataWithType$dataType == "PO", ]
      mergedDatasets[[1]] <- processedData[dataToMerge$datasetName]%>%
        do.call("rbind", .)%>%
        as.data.frame(., row.names = NULL)%>%
        dplyr::group_by(acceptedScientificName, dataType, taxa, year, taxonKeyProject, order)%>%
        distinct(geometry)%>%
        dplyr::ungroup()
      
      rownames(mergedDatasets[[1]]) <- NULL
      
      mergedDatasets[[1]] <- mergedDatasets[[1]] %>%
        st_as_sf(., crs = newCrs)%>%
        filter(!is.na(order))%>%
        dplyr::select(order, dataType) %>%
        dplyr::group_by(dataType, order)%>%
        distinct()%>%
        dplyr::ungroup()
      
      names(mergedDatasets)[1] <- paste0("mergedDataset", "PO")
      
      # Format PA data
      dataToMerge <- mergedDataWithType[mergedDataWithType$dataType == "PA", ]
      
     paData <-  lapply(dataToMerge$datasetName, function(x){
        processedData[[x]] %>%
         dplyr::group_by(individualCount, dataType, taxa, year, taxonKeyProject, order) %>%
         distinct(geometry)%>%
         dplyr::ungroup()
      })
      
     names(paData) <- dataToMerge$datasetName
     
     # Format count data
     dataToMerge <- mergedDataWithType[mergedDataWithType$dataType == "Counts", ]
     countData <- processedData[dataToMerge$datasetName]

    # Put all datasets togehter      
    mergedDatasets <- c(mergedDatasets,
                        countData,
                        paData
    )

saveRDS(mergedDatasets,
        paste0(pollinatorFolder, "/formatData/formattedPollinatorData.RDS"))
} else {

  pollinatorData <- readRDS(paste0(pollinatorFolder, "/formatData/formattedPollinatorData.RDS"))
}


# Format the data
PointsBeesAndButterfliesAndHoverflies <- lapply(pollinatorData, function(x){
  ret <-  x %>%
    dplyr::mutate(Taxon = ifelse(order == "Lepidoptera", "Butterflies",
                                 ifelse(order == "Hymenoptera", "Bees",
                                        "Hoverflies")))
  return(ret)
})

# Shorten the names of the datasets
shortNames <- str_replace_all(names(PointsBeesAndButterfliesAndHoverflies), " ", "")
names(PointsBeesAndButterfliesAndHoverflies) <- shortNames

# We need to filter out some taxonomic groups
beesOnly <-  c("Solitarybeescollectedinalarge-scalefieldexperimentinpowerlineclearings,southeastNorway" , "Bumblebeescollectedinalarge-scalefieldexperimentinpowerlineclearings,southeastNorway")
 ret <- lapply(PointsBeesAndButterfliesAndHoverflies[beesOnly], function(x){
   x %>%
     filter(Taxon == "Bees")
 })
 
 PointsBeesAndButterfliesAndHoverflies[beesOnly] <- ret

 beesAndButterfles <-  c("BumblebeesandbutterfliesinNorway")
 ret <- lapply(PointsBeesAndButterfliesAndHoverflies[beesAndButterfles], function(x){
   x %>%
     filter(Taxon != "Hoverflies")
 })
 
 PointsBeesAndButterfliesAndHoverflies[beesAndButterfles] <- ret
 
 
 ################################################
 # PLANT SPECIES FORMAT
 ####################################################

 # The plant data format is similar to the pollinators
 # Except we select the species within the pollinator-plants interaction data.
 
if(!file.exists(paste0(dataFolder, "/plantDataFolder/formattedData/formattedPlantData.RDS"))){
taxonKey <- 7707728
  if (file.exists(paste0(dataFolder, "/downloadKey.RDS"))) {
    downloadKey <- readRDS(paste0(dataFolder, "/downloadKey.RDS"))
  } else {
    downloadKey <- getDownloadKey(taxonKey, regionGeometry)
    if(waitForGbif){
      message("Download key has been created and will download once it is ready (5-30 minutes). ",
              "View the download status at https://www.gbif.org/occurrence/download/", 
              downloadKey)
      downloadKey <- occ_download_wait(downloadKey, curlopts = list(), quiet = FALSE)
      base::attr(downloadKey,'doi') <- downloadKey$doi
      saveRDS(downloadKey, file = paste0(dataFolder, "/downloadKey.RDS"))
    } else {
      downloadKey <- occ_download_meta(downloadKey) 
      saveRDS(downloadKey, file = paste0(dataFolder, "/downloadKey.RDS"))
      stop(paste0("Download key has been created and your download is being prepared. View the download at https://www.gbif.org/occurrence/download/",
                  downloadKey$key, ". Come back and start the download in 5-30 minutes."))
    }
  }
  
  # Start GBIF Download  
  
  #### FORMAT SCHEDULED DOWNLOAD ####
  
  # This script takes your download key and downloads and produces a data frame containing all your 
  # relevant species occurrences. It also adds a few important columns.
  
  # Download and unzip the file from GBIF
if(!file.exists(paste0(dataFolder, "/plantDataFolder/data/occurrences.RDS"))){
  downloadGet <- occ_download_get(key=downloadKey$key, path=paste0(dataFolder, "/plantDataFolder/data"), overwrite=TRUE)
  occurrences <- occ_download_import(downloadGet)
  saveRDS(occurrences, 
          paste0(dataFolder, "/plantDataFolder/data/occurrences.RDS"))
} else {
  occurrences <- readRDS(paste0(dataFolder, "/plantDataFolder/data/occurrences.RDS"))
}
  # Create dataset types
  # using the default creation from the hotspot project
  # Import metadata information
  if(file.exists(paste0(dataFolder, "/metadataSummary.csv"))){
    dataTypes <- read.csv(paste0(dataFolder, "/metadataSummary.csv"))
  } 
  
  # Reduce to relevant columns immediately to save space
  occurrences <- occurrences[,c("acceptedScientificName", "decimalLongitude", "decimalLatitude", "basisOfRecord",
                                "year", "month", "datasetKey", "datasetName", "taxonRank", "kingdomKey", "phylumKey", "classKey", "orderKey",
                                "familyKey", "genusKey", "speciesKey", "issue", "genus", "order")]
  
  # Remove any occurrences with certain issues attached to them
  issuesToFlag <- c("ZERO_COORDINATE|COORDINATE_OUT_OF_RANGE|COORDINATE_INVALID|COORDINATE_PRECISION_INVALID|COORDINATE_UNCERTAINTY_METRES_INVALID")
  occurrences <- occurrences %>%
    filter(datasetKey %in% dataTypes$datasetKey[!is.na(dataTypes$processing)]) %>%
    filter(!grepl(issuesToFlag,issue))
  
  # Assign a taxon key based on what level of taxonomy the key is valid for
  occurrences$taxonKeyProject <- ifelse(occurrences$speciesKey %in% taxonKey, occurrences$speciesKey,
                                        ifelse(occurrences$genusKey %in% taxonKey, occurrences$genusKey,
                                               ifelse(occurrences$familyKey %in% taxonKey, occurrences$familyKey,
                                                      ifelse(occurrences$orderKey %in% taxonKey, occurrences$orderKey,
                                                             ifelse(occurrences$classKey %in% taxonKey, occurrences$classKey,
                                                                    ifelse(occurrences$phylumKey %in% taxonKey, occurrences$phylumKey,
                                                                           ifelse(occurrences$kingdomKey %in% taxonKey, occurrences$kingdomKey, NA)))))))
  
  # Add taxa and reduce to relevant columns
  occurrences$taxa <- "plants"#focalTaxon$taxa[match(occurrences$taxonKeyProject, focalTaxon$key)]
  
  occurrences <- occurrences[,c("acceptedScientificName", "decimalLongitude", "decimalLatitude", "basisOfRecord",
                                "year", "month", "datasetKey", "datasetName", "taxa", "taxonKeyProject", "taxonRank", "genus", "order")] %>%
    filter(!is.na(taxa))
  
  # Get the data with interractions
  source("pipeline/webForInterractions/webPlotsForModels.R")
  
  # select only the species within the interraction matrix
  occurrences <- occurrences %>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    )
  )%>%
    dplyr::filter(simpleScientificName %in% interractionMatrix$species)%>%
    dplyr::filter(year > 2009)
  
###------------------------------###
### 3. Attach relevant metadata ####
###------------------------------###

# Now we import metadata related to GBIF data
metadataList <- metadataPrep(occurrences, metaSummary = TRUE)

# Import dataset type based on dataset name. If no dataset information is provided, all data will be downloaded and assumed to
# be presence only data
GBIFImportCompiled <- merge(occurrences, 
                            metadataList$metadata, 
                            all.x=TRUE, 
                            by = "datasetKey")

# Narrow down to known data types and split into data frames
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
GBIFLists <- lapply(unique(GBIFImportCompiled$name), FUN  = function(x) {
  GBIFItem <- GBIFImportCompiled[GBIFImportCompiled$name == x,]
  GBIFItem <- st_as_sf(GBIFItem,                         
                       coords = c("decimalLongitude", "decimalLatitude"),
                       crs = projcrs)%>%
    st_transform(., newCrs)
  GBIFcropped <- st_intersection(GBIFItem, regionGeometry)
  GBIFcropped
})

names(GBIFLists) <- unique(GBIFImportCompiled$name)

# Put the dataLists together and save them
dataList <- list(species = GBIFLists, 
                 metadata = metadataList, 
                 projcrs = newCrs)
base::attr(dataList, "level") <- base::attr(regionGeometry, "level")
base::attr(dataList, "region") <- base::attr(regionGeometry, "region")
saveRDS(dataList, 
        paste0(dataFolder, "/plantDataFolder/formattedData/speciesDataImported.RDS"))

##########################
# Process datasets
##########################
speciesData <- dataList[["species"]]
metadata <- dataList$metadata$metadata
rm("dataList")


# start processing each dataset
processedData <- list()
namesProcessedData <- c()
for (ds in seq_along(speciesData)) {
  focalData <- speciesData[[ds]]
  
  # If the dataset is empty, skip it
  if (nrow(focalData) == 0) next
  
  dataType <- unique(focalData$type)
  #if(is.null(dataType)) dataType <- "OCCURRENCE"
  datasetName <- names(speciesData)[ds]
  newDataset <- NULL
  
  cat("Currently processing dataset '", datasetName,"' \n", sep = "")
  
 if (dataType == "SAMPLING_EVENT") {
   surveyedSpecies <- interractionMatrix$species
    focalEndpoint <- metadata$DWCEndpoint[metadata$name == datasetName]
    newDataset <- tryCatch(processFieldNotesEvent(focalEndpoint, dataFolder , datasetName, regionGeometry, taxonKey, newCrs, projcrs, focalData, surveyedSpecies),
                           error = function(e) {NULL})
    
    # No need to do anything to presence only data (yet) except add individualCount column
  } else if (dataType == "OCCURRENCE") {
    focalData$dataType <- "PO"
    newDataset <- focalData[,c("acceptedScientificName", "geometry", "dataType", "taxa", "year", "taxonKeyProject", "genus", "order")]
  }
  
  if (is.null(newDataset)) {break}

  
  # convert year to numeric
  newDataset$year <- as.numeric(newDataset$year)
  
  # Save and name new dataset
  processedData[[ds]] <- newDataset
  namesProcessedData[ds] <- datasetName
}

names(processedData) <- namesProcessedData

# Save for use in model construction
processedData <- processedData[lapply(processedData,nrow)>0]
saveRDS(processedData, paste0(dataFolder, "/plantDataFolder/formattedData/speciesDataProcessed.RDS"))


###--------------------------------###
### 3. Compile into one data.frame ####
###--------------------------------###

# Edit data frames to have the same number of columns
processedDataCompiled <- do.call(rbind, lapply(1:length(processedData), FUN = function(x) {
  dataset <- processedData[[x]]
  datasetName <- names(processedData)[x]
  datasetType <- unique(dataset$dataType)
  if (datasetType == "PO") {
    dataset$individualCount <- 1
  }
  datasetShort <- dataset[, c("acceptedScientificName", "individualCount", "geometry", "taxa", "year", "dataType", 
                              "taxonKeyProject", "genus", "order")]
  datasetShort$dsName <- datasetName
  datasetShort
}))

# Remove absences, combine into one data frame and add date accessed
processedPresenceData <- processedDataCompiled[processedDataCompiled$individualCount > 0,]
saveRDS(processedPresenceData, paste0(dataFolder, "/plantDataFolder/processedPresenceData.RDS"))

rm("GBIFLists")
rm("GBIFImportCompiled")
rm("newDataset")

############################
# Matching Processed data species to that from ASO data
#########################
# We are using ASO data as the basis for modelling the plant interractions
# So the species in the ASOdata are selected and 
# all datasets are filtered to have those species
asoDataNames <- processedData[[19]]
saveRDS(asoDataNames, paste0(dataFolder, "/plantDataFolder/formattedData/asoData.RDS"))
asoDataNames <- unique(asoDataNames$acceptedScientificName)

# selecting the ASO data species 
formattedData <- lapply(processedData, function(x){
  out <- x %>%
    dplyr::filter(acceptedScientificName %in% asoDataNames)
  return(out)
})

rm("processedData")
rm("occurrences")
rm("speciesData")
#Check if all the datasets are of the same dataType
mergedDataWithType <- lapply(formattedData, function(x){
  unique(x$dataType)
})%>%
  do.call("rbind", .)%>%
  as.data.frame(.)%>%
  mutate(datasetName = rownames(.))%>%
  rename(dataType = V1)

uniqueDataType <- unique(mergedDataWithType$dataType)

# create the merged dataset
mergedDatasets <- list()
#for(i in seq_along(uniqueDataType)){

# Merge all Presence-only datasets together
  dataToMerge <- mergedDataWithType[mergedDataWithType$dataType == "PO", ]
  mergedDatasets[[1]] <- formattedData[dataToMerge$datasetName]%>%
    do.call("rbind", .)%>%
    as.data.frame(., row.names = NULL)
  
  rownames(mergedDatasets[[1]]) <- NULL
  
  mergedDatasets[[1]] <- mergedDatasets[[1]] %>%
    st_as_sf(., crs = newCrs)%>%
    mutate(
      simpleScientificName = coalesce(
        #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
        str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
      )
    )%>%
    dplyr::filter(simpleScientificName %in% interractionMatrix$species)
  
  names(mergedDatasets)[1] <- paste0("mergedDataset", "PO")
  
  otherData <- mergedDataWithType[!mergedDataWithType$dataType %in% "PO", "datasetName"] %>%
    lapply(., function(x){
      print(x)
      #ds <- which(x %in% mergedDataWithType[!mergedDataWithType$dataType %in% "PO", "datasetName"])
      mergeData <- formattedData[x][[1]]%>%
        st_as_sf(., crs = newCrs)%>%
        mutate(
          simpleScientificName = coalesce(
            #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
            str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
          )
        )%>%
        dplyr::filter(simpleScientificName %in% interractionMatrix$species)
      
      #names(mergeData) <- x
      return(mergeData)
    })
  
  names(otherData) <- mergedDataWithType[!mergedDataWithType$dataType %in% "PO", "datasetName"]
  
  mergedDatasets <- c(mergedDatasets,
                      otherData)
  # Kepp the individual Presence-only data
  
#}

# save the data
saveRDS(mergedDatasets, paste0(dataFolder, "/plantDataFolder/formattedData/formattedPlantData.RDS"))
} else {

  plantSpeciesData <- readRDS(paste0(dataFolder, "/plantDataFolder/formattedData/formattedPlantData.RDS"))
  } 
Table 2: Number of pollinator occurrences from the formatted datasets obtained from GBIF. All presence-only datasets are merged into one dataset (mergedDataset).
Dataset name No. of bees occurrences No. of butterflies occurrences No. of hoverflies occurrences
mergedDatasetPO 22316 61009 17555
National insect monitoring in Norway 182 157 241
Saproxylic insects caught in window traps and hatched from polypores in small and large old forests in southern Norway 5 5 0
Solitary bees collected in a large-scale field experiment in power line clearings, southeast Norway 391 0 0
Freshwater benthic invertebrates ecological collection NTNU University Museum 3562 3571 5439
Bumble bees collected in a large-scale field experiment in power line clearings, southeast Norway 78 0 0
Bumblebees and butterflies in Norway 5996 5370 0
Freshwater pelagic invertebrates ecological collection NTNU University Museum 1605 0 0
Table 3: Number of plant occurrences from the formatted datasets obtained from GBIF. All presence-only datasets are merged into one dataset (mergedDataset).
Species mergedDatasetPO Monitoring data of natural and man-made semi-natural meadows in and around Oslo, Norway 2018-2021 Vegetation data with and without experimental warming, alpine Finse 2000, 2004, 2011 Effects of vegetation clearing on vascular plants in power line clearings southeast Norway Vascular plants in power line clearings and the nearby forest, southeast Norway Overvåking av semi-naturlig eng (ASO)
Ajuga pyramidalis 4003 22 720 1533 599 147
Astragalus alpinus 1851 22 720 1533 599 147
Campanula rotundifolia 11703 22 720 1533 599 147
Carum carvi 2584 22 720 1533 599 147
Filipendula ulmaria 15967 22 720 1533 599 147
Galium album 3624 22 720 1533 599 147
Galium aparine 1722 22 720 1533 599 147
Galium boreale 7360 22 720 1533 599 147
Galium elongatum 1399 22 720 1533 599 147
Galium palustre 2180 22 720 1533 599 147
Galium saxatile 1299 22 720 1533 599 147
Galium uliginosum 1761 22 720 1533 599 147
Galium verum 3276 22 720 1533 599 147
Geranium robertianum 5497 22 720 1533 599 147
Geranium sylvaticum 13861 22 720 1533 599 147
Gymnadenia conopsea 3179 22 720 1533 599 147
Hieracium murorum 20 22 720 1533 599 147
Hieracium umbellatum 3663 22 720 1533 599 147
Hieracium vulgatum 26 22 720 1533 599 147
Hypochaeris radicata 1287 22 720 1533 599 147
Knautia arvensis 6030 22 720 1533 599 147
Lathyrus linifolius 6118 22 720 1533 599 147
Lathyrus pratensis 5458 22 720 1533 599 147
Lathyrus vernus 2013 22 720 1533 599 147
Leucanthemum vulgare 9160 22 720 1533 599 147
Lotus corniculatus 11190 22 720 1533 599 147
Nardus stricta 5966 22 720 1533 599 147
Origanum vulgare 2178 22 720 1533 599 147
Plantago lanceolata 5091 22 720 1533 599 147
Plantago major 6525 22 720 1533 599 147
Plantago media 2045 22 720 1533 599 147
Silene dioica 6888 22 720 1533 599 147
Silene vulgaris 3343 22 720 1533 599 147
Stellaria graminea 7784 22 720 1533 599 147
Stellaria media 3723 22 720 1533 599 147
Stellaria nemorum 3694 22 720 1533 599 147
Trifolium medium 4394 22 720 1533 599 147
Trifolium pratense 11519 22 720 1533 599 147
Trifolium repens 10479 22 720 1533 599 147
Trollius europaeus 2856 22 720 1533 599 147
Valeriana sambucifolia 106 22 720 1533 599 147
Veronica alpina 403 22 720 1533 599 147
Veronica chamaedrys 8302 22 720 1533 599 147
Veronica officinalis 10738 22 720 1533 599 147
Veronica serpyllifolia 711 22 720 1533 599 147
Vicia cracca 9257 22 720 1533 599 147
Vicia sepium 5705 22 720 1533 599 147
Vicia sylvatica 1969 22 720 1533 599 147
Viola biflora 2990 22 720 1533 599 147
Viola canina 3163 22 720 1533 599 147
Viola epipsila 423 22 720 1533 599 147
Viola palustris 7853 22 720 1533 599 147
Viola riviniana 8650 22 720 1533 599 147
Viola tricolor 3667 22 720 1533 599 147

2.1 Spatial and temporal resolution

For both pollinator and plant data obtained on a National scale (entire Norway), we used those observed within \(2010\) to \(2024\). We present the study region in Figure 2.

Figure 2: The geographic region used for this study.

We describe the indicators at the semi-natural habitats from the ASO Data meadows. The ASO Data locations and their presence/absence records are presented in Figure 10.

2.2 Original units

The original unit of the each dataset obtained from GBIF for the pollinators and plants are provided in Table 6 and Table 7 respectively. The dataset are either presence-only and presence-absence.

2.3 Additional comments about the dataset

3. Indicator properties

3.1. ECT

B1 - Compositional state characteristics

The indicator describes the diversity in biological communities.

3.2. Ecosystem condition characteristic

High indicator values represents meadows with intact plant communities that can facilitate the life cycles of many pollinators.

3.3. Other standards

In the Norwegian typology, yhe indicator is classed as biologisk mangfold.

3.4. Collinearities with other indicators

The indicator may be correlated with other indicators of plant or insect diversity, such as those using data from the Norwegian national insect monitoring.

4. Reference condition and values

4. 1. Reference condition

The reference condition is defined as the maximum ecological potential of pollinators in semi-natural meadows in Norway. This reference condition represents a state where all conditional indicators have a value of \(1\) (Keith et al. 2020). Our reference conditions are based on modeled reference conditions (Hickler et al. 2012; Keith et al. 2020) of ASO meadows. Specifically, the pollinator potential at a given reference condition represents the highest diversity within a geo-climatic region. Although we estimated three alpha diversity indices (see Section 1.9.5), we chose to use the inverse-Simpson index to describe the diversity, since it disproportionately favors common species than the species richness and Shannon-Weiener indices (Jost 2006). An increase in the inverse Simpson index reflects an increase in diversity (Zhou et al. 2002).

An advantage of this model-based reference condition is that we can describe a national indicators for difference landscapes or ecosystems, with reference to the semi-natural habitats or ecosystems, and describe ecosystem variables that can affect the conditions within these landscapes. As a downside, it becomes difficult to scale conditions at levels with indicator values higher than the reference (Keith et al. 2020), especially for regions where alpha diversity is higher than those in the semi-natural grasslands.

4. 2. Reference values

As defined by Keith et al. (2020), we define the value of our ecosystem variable (alpha diversity described in Section 1.9.5) at reference condition and this value will be used to re-scale a variable to derive individual indicators. Putting all pollinators together, we set \(0\) and the maximum diversity value as the endpoints of the range of a condition variable we use in re-scaling. A value of \(0\) shows a poor condition where there is no pollinator diversity and a value of \(1\) indicates a good condition with the highest potential pollinator diversity.

In summary, there are two steps to obtaining these reference values:

  • Estimating the maximum values and re-scaling the ecosystem variable
  • Defining a threshold value for defining good ecological condition

4.2.1 Maximum values

We divide our semi-natural meadows into geo-climatic regions (that is, they are clustered according to their latitude and annual temperature values). This is a simplification as compared to using a more stable and partly historic vegetation zones (described by feltkurskompendium biologisk mangfold terrestrisk biologi and store norske leksikon). Moreover, this clusters allows us to track species distribution with rising temperatures and the latitude accounts for lag time in establishment [REF].

In each of these clusters, we estimate the maximum alpha diversity index and use this value as reference value for that geo-climatic region. The estimates of the maximum diversity indices within each geo-climatic regions is presented in Table 4.

Table 4: The maximum values of the alpha diversity (inverse Simpson index) estimated within difference geo-climatic regions based on their scaled latitude and annual temperature values at the ASO data meadows.
Geo climatic region Latitude Temperature Maximum value
1 Positive Positive 7714.728
2 Negative Positive 7552.012
4 Negative Negative 6021.358

Using the reference values presented in Table 4, we use a linear transformation to calculate the ecosystem condition indicators (Keith et al. 2020): \[ Indicator = \frac{D - V_L}{V_H - V_L}; \] where \(D\) is the alpha diversity estimate, \(V_L\) is the lowest reference level (which we have set to \(0\)) and \(V_H\) is the maximum reference value. We present these indicator values for the ASO meadows and the national scale in Figure 3. Note that the indicators on the national scale needs to be interpreted with reference to the ASO meadows.

Code
speciesForTaxon <- biotraits %>%
  dplyr::select(acceptedScientificName, Taxon, zone)%>%
  group_by(Taxon, zone)%>%
  distinct()%>%
  ungroup()%>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    ))%>%
  na.omit()

# Load all the plant predictions result
if(!exists("interractionMatrix")) source("pipeline/webForInterractions/webPlotsForModels.R")

asoDatasf <- readRDS(file = paste0(dataFolder, "/plantDataFolder/formattedData/asoData.RDS"))%>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    )) %>%
  dplyr::filter(simpleScientificName %in% c(interractionMatrix$species)) #

plantSpecies <- unique(asoDatasf$simpleScientificName)

plantSpecies <- sapply(plantSpecies, function(x){
  rt <- stringr::str_split(x, " ")[[1]]
  taxon <- paste(rt[1], rt[2], sep = "_") 
  return(taxon)
}) %>%
  c()

plantSpecies <- as.character(plantSpecies)


##################################################
# Plant Species results
#########################################

speciesData <- lapply(paste0(dataFolder, "/modelOutputs/plants/", plantSpecies), function(x){
 # try(list.files(x, pattern = paste0("predictions.rds"), recursive = TRUE, full.names = TRUE))
  preds <- readRDS(list.files(x, pattern = paste0("predictions.rds"), recursive = TRUE, full.names = TRUE))
  return(preds$Probabilities)
})|>  
  rast() |>  # combine raster layers
  #scale() |>  # scale raster layers
  setNames(plantSpecies)

if(rerun){

######################################
# ZONAL DIVERSITY ESTIMATES
#####################################
zone <- c("alpine", "boreal", "temperate")
# Make predictions for all the zones
sp <- "allTaxa"
spPred <- estimateTaxonDiversity(pollinatorSpeciesNames = NULL,
                                 speciesForTaxon =  speciesForTaxon,
                                 taxa = sp,
                                 predRast = predRast,
                                 silent = FALSE)   

path <- paste0(dataFolder, "/modelOutputs/diversity/", sp)
if(!file.exists(path)){
  dir.create(path)
}
saveRDS(spPred, file.path(path, paste0("predictions.rds")))

} else {
  # load(paste0(dataFolder, "/modelOutputs/diversity/alpineDiversity.RData"))
  # load(paste0(dataFolder, "/modelOutputs/diversity/borealDiversity.RData"))
  
  alpine <- readRDS("Data/modelOutputs/diversity/alpine/predictions.rds")
  boreal <- readRDS("Data/modelOutputs/diversity/boreal/predictions.rds")
  temperate <- readRDS("Data/modelOutputs/diversity/temperate/predictions.rds")
  
  bees <- readRDS("Data/modelOutputs/diversity/Bees/predictions.rds")
  butterflies <- readRDS("Data/modelOutputs/diversity/Butterflies/predictions.rds")
  hoverflies <- readRDS("Data/modelOutputs/diversity/Hoverflies/predictions.rds")
}

#############################
# Diverisity at ASOData locations
#############################
source("pipeline/webForInterractions/webPlotsForModels.R")
asoDatasf <- readRDS(file = paste0(dataFolder, "/plantDataFolder/formattedData/asoData.RDS"))%>%
  mutate(
    simpleScientificName = coalesce(
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    )) %>%
  dplyr::filter(simpleScientificName %in% c(interractionMatrix$species))%>%
  st_transform(newCrs)# Select the species in the interraction matrix

source("pipeline/dataImport/environmentalDataImport.R")
names(environmentalCovariates)[14] <- "bio.sq"
names(environmentalCovariates)[16] <- "bm.Intsq"

# Reclassify the levels of the environmental covariates
levels(environmentalCovariates$landCover)[[1]][,2][c(18, 19, 23, 24, 25, 26)] <- "Other forest"
levels(environmentalCovariates$landCover)[[1]][,2][29:37] <- "Others"

# Reclassify the levels of environmentalCovariates randomised
levels(environmentalCovariatesRandomized$landCover)[[1]][,2][c(18, 19, 23, 24, 25, 26)] <- "Other forest"
levels(environmentalCovariatesRandomized$landCover)[[1]][,2][29:37] <- "Others"

# Estract the covariates and diversity values at these locations
asoDataDiv <- terra::extract(allRast, 
                             unique(vect(st_geometry(asoDatasf))))
asoDataCovs <- terra::extract(environmentalCovariates, 
                              unique(vect(st_geometry(asoDatasf))))

asoDataMeadows <- cbind(asoDataDiv,
                        asoDataCovs,
                        crds(unique(vect(st_geometry(asoDatasf)))))%>%
  st_as_sf(., coords = c("x", "y"), crs = newCrs)

asoDataMeadows <- asoDataMeadows[complete.cases(asoDataMeadows%>% st_drop_geometry()), ]%>%
  rowwise()%>%
  mutate(sumShannon = sum(c_across(c(3, 6, 9)))) %>%
  mutate(cluster = ifelse(Latitude > 0 & bio1 > 0, 1,
                          ifelse(Latitude < 0 & bio1 > 0, 2,
                                 ifelse(Latitude > 0 & bio1 < 0, 3, 4))))%>%
  dplyr::group_by(cluster)%>%
  dplyr::mutate(alpineReference = alpine_simpson/ max(allZones_simpson),
         borealReference = boreal_simpson/ max(allZones_simpson),
         temperateReference = temperate_simpson/ max(allZones_simpson),
         allReference = allZones_simpson / max(allZones_simpson),
         maxReference = max(allZones_simpson),
         minReference = min(allZones_simpson))%>%
  ungroup()


#save the asoData meadows for plot later
save(asoDataMeadows,
     file = paste0(dataFolder, "/markdownResults/image/asoDataMeadows.RData"))
Figure 3: Indicator values at a) ASO meadows and b) National scale with the ASO meadows (black rectangles) on a 500m resolution. For each grid cell on the national scale, assign it a geoclimatic zone and scale its diversity index by its respective reference value presented in Table 4.

4.2.2. Threshold value for defining good ecological condition (if relevant)

Given the indicator values estimated in Section 1.4.2.1, we define threshold values for defining a good ecosystem condition. These choose three threshold values and assess the impacts of the choice of threshold on the ecosystem condition:

  • Threshold of 0.5

  • Threshold of 0.6 (used by Framstad et al. (2022) in defining condition of forest and mountain ecosystem in Norway)

  • Threshold of 0.75

Any location or grid cell or meadow with its indicator greater than this threshold is deemed to be in a good condition, otherwise in a poor condition. Figure 4 presents the conditions of pollinators in ASO meadows.

Code
interestedVars <- c("alpine_simpson", "boreal_simpson", "temperate_simpson", "allZones_simpson")

selectedVars <- lapply(interestedVars, function(x){
  allRast[[x]]
}) %>%
  rast()%>%
  as.data.frame(na.rm = TRUE, xy = TRUE)%>%
  st_as_sf(., coords = c("x", "y"), crs = newCrs)%>%
  cbind(., terra::extract(environmentalCovariates, 
                                         unique(vect(st_geometry(.)))))%>%
  dplyr::mutate(allReference = ifelse(Latitude > 0 & bio1 > 0, (allZones_simpson/7714.728),
                          ifelse(Latitude < 0 & bio1 > 0, (allZones_simpson/7552.012),
                                 ifelse(Latitude > 0 & bio1 < 0, (allZones_simpson/7714.728), 
                                        (allZones_simpson/6021.358)))))%>%
  dplyr::mutate(condition75 = ifelse(allReference > 0.75, "Good", "Poor"),
                condition60 = ifelse(allReference > 0.60, "Good", "Poor"),
                condition50 = ifelse(allReference > 0.5, "Good", "Poor"))%>%
  select(allReference, condition50, condition60, condition75)

save(selectedVars,
     file = paste0(dataFolder, "/markdownResults/image/ecosystemConditions.RData"))

What does this mean and how do we interpret these thresholds. Does these conditions make sense and how do we interpret those?

Figure 4: The condition of pollinators in semi-natural ecosystem (ASO meadows). The green colored points represent the meadows with good ecosystem conditions and the red colored points represent the meadows with poor ecosystem conditions.

4.2.3. Spatial resolution and validity

The reference values are estimated on \(500\) meter resolution and it is fixed across all areas. We also present the pollinator condition on the national scale in Figure 5.

Figure 5: The condition of pollinators across several habitat types in Norway with reference to the conditions in semi-natural ecosystem (ASO meadows). The green colored points represent the meadows with good ecosystem conditions and the red colored points represent the meadows with poor ecosystem conditions.

5. Uncertainties

Instead of resampling our data to define the uncertainties (Framstad et al. 2022), we use the 95% from the pollinator intensity and plant species occurrence probability.

This can be done, but I need some time to do this.

6. References

Adjei, Peprah Kwaku, Philip Mostert, Jorge Sicacha Parada, Emma Skarstein, and Robert B O’Hara. 2023. “The Point Process Framework for Integrated Modelling of Biodiversity Data.” arXiv e-Prints, arXiv–2311.
Bachl, Fabian E., Finn Lindgren, David L. Borchers, and Janine B. Illian. 2019. inlabru: An R Package for Bayesian Spatial Modelling from Ecological Survey Data.” Methods in Ecology and Evolution 10: 760–66. https://doi.org/10.1111/2041-210X.13168.
Blangiardo, Marta, and Michela Cameletti. 2015. Spatial and Spatio-Temporal Bayesian Models with r-INLA. John Wiley & Sons.
Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2024. Rgbif: Interface to the Global Biodiversity Information Facility API. https://CRAN.R-project.org/package=rgbif.
Dorazio, Robert M. 2014. “Accounting for Imperfect Detection and Survey Bias in Statistical Analysis of Presence-Only Data.” Global Ecology and Biogeography 23 (12): 1472–84. https://doi.org/10.1111/geb.12216.
Dormann, Carsten F., Bernd Gruber, and Jochen Fruend. 2008. “Introducing the Bipartite Package: Analysing Ecological Networks.” R News 8 (2): 8–11.
Fithian, William, Jane Elith, Trevor Hastie, and David A. Keith. 2015. “Bias Correction in Species Distribution Models: Pooling Survey and Collection Data for Multiple Species.” Methods in Ecology and Evolution 6 (4): 424–38. https://doi.org/10.1111/2041-210X.12242.
Framstad, Erik, Anders L Kolstad, Signe Nybø, Joachim Töpper, and Vigdis Vandvik. 2022. “The Condition of Forest and Mountain Ecosystems in Norway. Assessment by the IBECA Method.”
GBIF.Org User. 2024a. “Occurrence Download.” The Global Biodiversity Information Facility. https://doi.org/10.15468/DL.ZSEX2M.
———. 2024b. “Occurrence Download.” The Global Biodiversity Information Facility. https://doi.org/10.15468/DL.AWRNR2.
Gómez-Rubio, Virgilio. 2020. Bayesian Inference with INLA. Chapman; Hall/CRC.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. “An Introduction to Statistical Learning.”
Hickler, Thomas, Katrin Vohland, Jane Feehan, Paul A Miller, Benjamin Smith, Luis Costa, Thomas Giesecke, et al. 2012. “Projecting the Future Distribution of European Potential Natural Vegetation Zones with a Generalized, Tree Species-Based Dynamic Vegetation Model.” Global Ecology and Biogeography 21 (1): 50–63.
Hill, Mark O. 1973. “Diversity and Evenness: A Unifying Notation and Its Consequences.” Ecology 54 (2): 427–32.
Illian, Janine, Antti Penttinen, Helga Stoyan, and Dietrich Stoyan. 2008. Statistical Analysis and Modelling of Spatial Point Patterns. John Wiley & Sons.
Isaac, Nick JB, Marta A Jarzyna, Petr Keil, Lea I Dambly, Philipp H Boersch-Supan, Ella Browning, Stephen N Freeman, et al. 2020. “Data Integration for Large-Scale Models of Species Distributions.” Trends in Ecology & Evolution 35 (1): 56–67. https://doi.org/10.1016/j.tree.2019.08.006.
Jost, Lou. 2006. “Entropy and Diversity.” Oikos 113 (2): 363–75.
———. 2007. “Partitioning Diversity into Independent Alpha and Beta Components.” Ecology 88 (10): 2427–39.
Keith, Heather, Bálint Czúcz, Bethanna Jackson, Amanda Driver, Emily Nicholson, and Joachim Maes. 2020. “A Conceptual Framework and Practical Structure for Implementing Ecosystem Condition Accounts.”
Mostert, Philip S, and Robert B O’Hara. 2023. “PointedSDMs: An r Package to Help Facilitate the Construction of Integrated Species Distribution Models.” Methods in Ecology and Evolution 14 (5): 1200–1207.
Museum, Natural History. 2023a. “Data Portal Query on "HOSTS" Created at 2023-06-28 10:30:49.598237 PID.” https://doi.org/https://doi.org/10.5519/qd.izm3kg02.
———. 2023b. “Subset of "HOSTS - a Database of the World’s Lepidopteran Hostplants" (Dataset) PID.” https://doi.org/https://doi.org/10.5519/havt50xw.
Oksanen, Jari, Gavin L. Simpson, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, et al. 2022. Vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan.
Poggio, Laura, Luis M De Sousa, Niels H Batjes, Gerard BM Heuvelink, Bas Kempen, Eloi Ribeiro, and David Rossiter. 2021. “SoilGrids 2.0: Producing Soil Information for the Globe with Quantified Spatial Uncertainty.” Soil 7 (1): 217–40.
Rahayu, Agavia Kori, Risti Endriani Arhatin, Jordan Gacutan, Firdaus Agung, Jessica Pingkan, Annisya Rosdiana, and Irfan Yulianto. 2024. “Optimising Marine Basic Spatial Units (MBSU) for Ocean Accounting Using Empirical Data from Saleh Bay, Indonesia.” One Ecosystem 9: e125578.
Rasmussen, Claus, Yoko L Dupont, Henning Bang Madsen, Petr Bogusch, Dave Goulson, Lina Herbertsson, Kate Pereira Maia, et al. 2021. “Evaluating Competition for Forage Plants Between Honey Bees and Wild Bees in Denmark.” Plos One 16 (4): e0250056.
Robinson, Gaden S., Phillip R. Ackery, Ian Kitching, George W Beccaloni, and Luis M. Hernández. 2023. “HOSTS - a Database of the World’s Lepidopteran Hostplants.” Natural History Museum. https://doi.org/10.5519/HAVT50XW.
Rue, Håvard, Sara Martino, and Nicholas Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations (with Discussion).” Journal of the Royal Statistical Society B 71: 319–92.
Thukral, Ashwani K. 2017. “A Review on Measurement of Alpha Diversity in Biology.” Agricultural Research Journal 54 (1).
Zhou, Jizhong, Beicheng Xia, David S Treves, L-Y Wu, Terry L Marsh, Robert V O’Neill, Anthony V Palumbo, and James M Tiedje. 2002. “Spatial and Resource Factors Influencing High Microbial Diversity in Soil.” Applied and Environmental Microbiology 68 (1): 326–34.
Zuur, AF. 2017. “Beginner’s Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with r-Inla: Using Glm and Glmm Volume i.” Hightland Statistics Ltd., Sl OCLC 973745327: 61.

7. Datasets

Here, we describe each of the dataset used for the modelling. We refer the readers to the dataset description provided on the GBIF website.

Table 5: Datasets obtained on selected plant species from GBIF, the type of dataset, total number of occurrence records in each dataset and its percentage in the total number of occurrence records. The dataset with type sampling_event is treated as presence-absence, occurrence as presence-only and insectMonitoring as count data.
Dataset Link to description
Bumble bees collected in a large-scale field experiment in power line clearings, southeast Norway https://www.gbif.org/dataset/19fe96b0-0cf3-4a2e-90a5-7c1c19ac94ee
Bumblebees and butterflies in Norway https://www.gbif.org/dataset/19fe96b0-0cf3-4a2e-90a5-7c1c19ac94ee
Ecofact https://www.gbif.org/dataset/19fe96b0-0cf3-4a2e-90a5-7c1c19ac94ee
Entomological collections, UiB [link](https://www.gbif.org/dataset/19fe96b0-0cf3-4a2e-90a5-7c1c19ac94ee)
Entomology collection, UiT Tromsø Museum NA
Entomology, Oslo (O) UiO NA
Freshwater benthic invertebrates ecological collection NTNU University Museum NA
Freshwater pelagic invertebrates ecological collection NTNU University Museum NA
Insect collection, UiT, University Museum (TSZ). Insect labeling project and PhD-duty-work NA
Insects of the Forest-Tundra Ecotone (ForTunE) NA
Invertebrate collections, UiB NA
Jordal NA
Lepidoptera collection, South Norway NA
Mapping and monitoring of the Glanville fritillary Melitaea cinxia NA
NINA insect database NA
NORSC - Sciaroidea, UiT Tromsø Museum NA
National insect monitoring in Norway https://www.gbif.org/dataset/19fe96b0-0cf3-4a2e-90a5-7c1c19ac94ee
Norwegian Biodiversity Information Centre - Other datasets NA
Norwegian Species Observation Service NA
Observation.org, Nature data from around the World NA
Occurrence data from various smaller projects in Norway NA
Saproxylic insects caught in window traps and hatched from polypores in small and large old forests in southern Norway https://www.gbif.org/dataset/19fe96b0-0cf3-4a2e-90a5-7c1c19ac94ee
Solitary bees collected in a large-scale field experiment in power line clearings, southeast Norway NA
Terrestrial and limnic invertebrates systematic collection, NTNU University Museum NA
iNaturalist Research-grade Observations NA
naturgucker NA

Here, I intend to link all the datasets to their correct version on the GBIF repository.

7.1 Pollinator datasets

Table 6: Datasets obtained on pollinators from GBIF, the type of dataset, total number of occurrence records in each dataset and its percentage in the total number of occurrence records. The dataset with type sampling_event is treated as presence-absence, occurrence as presence-only and insectMonitoring as count data.
Dataset name Data type No. of observations Percent
Bumble bees collected in a large-scale field experiment in power line clearings, southeast Norway SAMPLING_EVENT 5016 0.28
Bumblebees and butterflies in Norway SAMPLING_EVENT 31112 1.75
Ecofact OCCURRENCE 2 0.00
Entomological collections, UiB OCCURRENCE 2757 0.16
Entomology collection, UiT Tromsø Museum OCCURRENCE 2 0.00
Entomology, Oslo (O) UiO OCCURRENCE 63124 3.56
Freshwater benthic invertebrates ecological collection NTNU University Museum SAMPLING_EVENT 11559 0.65
Freshwater pelagic invertebrates ecological collection NTNU University Museum SAMPLING_EVENT 17 0.00
Insect collection, UiT, University Museum (TSZ). Insect labeling project and PhD-duty-work OCCURRENCE 2048 0.12
Insects of the Forest-Tundra Ecotone (ForTunE) OCCURRENCE 216 0.01
Invertebrate collections, UiB OCCURRENCE 14 0.00
Jordal OCCURRENCE 48 0.00
Lepidoptera collection, South Norway OCCURRENCE 613 0.03
Mapping and monitoring of the Glanville fritillary Melitaea cinxia OCCURRENCE 785 0.04
NINA insect database OCCURRENCE 21342 1.20
NORSC - Sciaroidea, UiT Tromsø Museum OCCURRENCE 6886 0.39
National insect monitoring in Norway insectMonitoring 610146 34.41
Norwegian Biodiversity Information Centre - Other datasets OCCURRENCE 21 0.00
Norwegian Species Observation Service OCCURRENCE 971031 54.75
Observation.org, Nature data from around the World OCCURRENCE 4636 0.26
Occurrence data from various smaller projects in Norway OCCURRENCE 7 0.00
Saproxylic insects caught in window traps and hatched from polypores in small and large old forests in southern Norway SAMPLING_EVENT 122 0.01
Solitary bees collected in a large-scale field experiment in power line clearings, southeast Norway SAMPLING_EVENT 2699 0.15
Terrestrial and limnic invertebrates systematic collection, NTNU University Museum OCCURRENCE 32654 1.84
iNaturalist Research-grade Observations OCCURRENCE 6456 0.36
naturgucker OCCURRENCE 99 0.01

Out of the 26 datasets obtained from GBIF, over half of the total occurrence reports for the pollinators were from the Norwegian species Observation Service, a third of the total occurrence report were from the National insect monitoring in Norway, and the other 24 datasets cover 22 percent of the total pollinator occurrences.

Distribution of bees, butterflies and hoverfies in the merged Presence-only and National insect monitoring in Norway datasets. The merged Presence-only dataset is colored by the taxon, and the National insect monitoring dataset is colored by the presence (1) or absence (0) value.

Figure 6: Distribution of bees, butterflies and hoverfies caight in window traps and those collected in large-scale field experiment in power line clearings. The observations are colored by the presence (1) or absence (0) value.

After processing the data, it is important that we check that there are only information on bees from the following datasets: Solitary bees collected in a large-scale field experiment in power line clearings (southeast Norway), Bumble bees collected in a large-scale field experiment in power line clearings (southeast Norway), as can be seen in Figure 6 and Figure 7.

Figure 7: Distribution of bees, butterflies and hoverfies NTNU university Museum collection and large scale field experiment in Norway. The observations are colored by the presence (1) or absence (0) value.

We also check that we have information on only bees and butterflies from the Bumblebees and butterflies in Norway dataset, displayed in Figure 8.

Figure 8: Distribution of bees, butterflies and hoverfies in the Bumblebees and butterflies in Norway and freshwater pelagic invertebrates ecological collection from the NTNU University Museum datasets. The observations are colored by the presence (1) or absence (0) value.

7.2. Plant datasets

From our query for datasets from GBIF, we got a total of 30 datasets which we used to fit the ISDM for the plants. Out of these occurrence records, about 86 percent of these records were from the Norwegian species Observation Service.

Table 7: Datasets obtained on selected plant species from GBIF, the type of dataset, total number of occurrence records in each dataset and its percentage in the total number of occurrence records. The dataset with type sampling_event is treated as presence-absence, occurrence as presence-only and insectMonitoring as count data.
Dataset name Data type No. of observations Percent
ARKO strandeng OCCURRENCE 274 0.09
Ecofact OCCURRENCE 51 0.02
Effects of vegetation clearing on vascular plants in power line clearings southeast Norway SAMPLING_EVENT 779 0.24
Jordal OCCURRENCE 5434 1.70
Monitoring data of natural and man-made semi-natural meadows in and around Oslo, Norway 2018-2021 SAMPLING_EVENT 2310 0.72
Norwegian Species Observation Service OCCURRENCE 288837 90.51
Observation.org, Nature data from around the World OCCURRENCE 2993 0.94
Occurrence data from various smaller projects in Norway OCCURRENCE 775 0.24
Overvåking av semi-naturlig eng (ASO) SAMPLING_EVENT 1606 0.50
Overvåking av åpen grunnlendt kalkmark i Oslofjordområdet OCCURRENCE 1781 0.56
Pl@ntNet automatically identified occurrences OCCURRENCE 3265 1.02
Pl@ntNet observations OCCURRENCE 443 0.14
Stabbetorp - Floristiske registreringer 2016 OCCURRENCE 1209 0.38
Vascular Plant Herbarium, Oslo (O) UiO OCCURRENCE 1006 0.32
Vascular Plant Herbarium, UiB OCCURRENCE 45 0.01
Vascular Plants, Observations, Oslo (O) OCCURRENCE 192 0.06
Vascular plant herbarium (KMN) UiA OCCURRENCE 299 0.09
Vascular plant herbarium TRH, NTNU University Museum OCCURRENCE 659 0.21
Vascular plant herbarium, The Arctic University Museum of Norway (TROM) OCCURRENCE 208 0.07
Vascular plants in power line clearings and the nearby forest, southeast Norway SAMPLING_EVENT 269 0.08
Vegetation data with and without experimental warming, alpine Finse 2000, 2004, 2011 SAMPLING_EVENT 148 0.05
iNaturalist Research-grade Observations OCCURRENCE 6509 2.04
naturgucker OCCURRENCE 28 0.01
Figure 9: Distribution of plant species in the merged presence-only dataset.
Figure 10: Distribution of plant species in the ASO data. The observations are colored by the presence (1) or absence (0) value.

7.3. Covariates

Code
#########################################################################
## Load bioclimatic variables
#########################################################################

#BIO1 = Annual Mean Temperature


if(!file.exists(paste0(dataFolder, "/Rasters/environmentalCovariates.tiff"))){
  
  # convert crs to format accepted by sf, terra, and intSDM (& dependencies) 
  projCRS <- newCrs #sf::st_crs(crs)$proj4string

  
  #download the data
  #bioclim <- geodata::worldclim_country("NO", res = 10, var = "bio", path = paste0(dataFolder, "/Rasters"))

  # Import the climatic variables
  BioClimNorway <- rast(paste0(dataFolder, "/Rasters/wc2.1_country/NOR_wc2.1_30s_bio.tif"))%>%
  terra::project(., projCRS)

# Create a 1km grid cell
OnekmCellID <- BioClimNorway[[1]]
names(OnekmCellID) <- "CellID"
OnekmCellID[] <- 1:ncell(OnekmCellID)
OnekmCellID <- mask(OnekmCellID,BioClimNorway[[1]])

# Extract the latitude covariate
LatitudeRast <- BioClimNorway[[1]]
LatitudeRast[] <- crds(LatitudeRast, na.rm = FALSE)[,2]
LatitudeRast <- mask(LatitudeRast,BioClimNorway[[1]])
names(LatitudeRast) <- "Latitude"

#Extract the longitude covariate
LongitudeRast <- BioClimNorway[[1]]
LongitudeRast[] <- crds(LongitudeRast, na.rm = FALSE)[,1]
LongitudeRast <- mask(LongitudeRast,BioClimNorway[[1]])
names(LongitudeRast) <- "Longitude"

# Put them together
BioClimsForSpeciesRegions <- c(BioClimNorway[[1]],
                               OnekmCellID,
                               LatitudeRast, 
                               LongitudeRast)

names(BioClimsForSpeciesRegions)[1] <- "bio1"


# Extract the points within the spatial points
#source("pipeline/dataImport/importPollinatorDataset.R")
speciesDataImported <- readRDS(paste0(dataFolder, "/pollinatorDataFolder/speciesDataImported.RDS"))

PointsBeesAndButterfliesAndHoverfliesUsed <- speciesDataImported[["species"]]%>%
  do.call("rbind", .)%>%
  sf::st_as_sf(., coords = c('X', 'Y'), crs = newCrs)%>%
  dplyr::select(acceptedScientificName, Taxon)

rm("speciesDataImported")
gc()
#PointsBeesAndButterfliesAndHoverfliesUsed <- PointsBeesAndButterfliesAndHoverflies$mergedDatasetPO%>%
#  st_transform(newCrs)%>%
#  vect()

#Extract values for the Bioclims and PointsBeesAndButterfliesAndHoverflies
climaticVarsExtracted <- terra::extract(BioClimsForSpeciesRegions,
                                        PointsBeesAndButterfliesAndHoverfliesUsed)%>%
cbind(., st_coordinates(st_as_sf(PointsBeesAndButterfliesAndHoverfliesUsed)))%>%
  st_as_sf(., coords = c('X', 'Y'), crs = projCRS)%>%
  st_join(st_as_sf(PointsBeesAndButterfliesAndHoverfliesUsed), .,join = st_nearest_feature, left = TRUE)


# Calculate the mean and SD of the bio1
Bio1Traits <- aggregate(bio1~acceptedScientificName+Taxon,
                        climaticVarsExtracted, 
                        mean)
names(Bio1Traits)[3] <- "mBio1"

Bio1Traits$sdBio1 <- aggregate(bio1~acceptedScientificName+Taxon,
                               climaticVarsExtracted,
                               sd)[,3]
vals <- Bio1Traits$sdBio1
vals[is.na(vals)] <- 0
Bio1Traits$sdBio1 <- vals

Bio1Traits$mLat <- aggregate(Latitude ~ acceptedScientificName+Taxon,
                               climaticVarsExtracted,
                               mean)[,3]

Bio1Traits$sdLat <- aggregate(Latitude ~ acceptedScientificName+Taxon,
                             climaticVarsExtracted,
                             sd)[,3]
vals <- Bio1Traits$sdLat 
vals[is.na(vals)] <- 0
Bio1Traits$sdLat  <- vals

# Community estimates for each 1 Km grid
Bio1Traits$RegCom <-aggregate(CellID~acceptedScientificName+Taxon,
                                     climaticVarsExtracted, 
                              function(x)length(unique(x)))[,3]

roundFunction <- function(x, digits) round(x, 1)

Bio1Traits <- Bio1Traits %>%
  mutate_at(c("mBio1", "sdBio1", "mLat", "sdLat", "RegCom"), scale)#%>%
 #mutate_at(c("mBio1", "sdBio1", "mLat", "sdLat", "RegCom"), roundFunction)

write.csv(Bio1Traits, file = paste0(dataFolder, "/bioTraits.csv"))

# We randomize the traits for use in cross validation.
# We create some background locations for this purpose
BackgroundBioClim <- BioClimsForSpeciesRegions[]
BackgroundBioClim <- data.frame(BackgroundBioClim[complete.cases(BackgroundBioClim),])
BackgroundBioClim$Lat <- BackgroundBioClim$Latitude
#BackgroundBioClim$Occurrence <- 0
#head(BackgroundBioClim)

RandomSampleBioClimTraits <- Bio1Traits[sample(1:nrow(Bio1Traits[,3:7]),
                                               nrow(BackgroundBioClim),
                                               replace = TRUE),
                                        3:7]


# Merge these estimates with the climatic variables extracted
climaticVarsExtractedNew <- merge(climaticVarsExtracted,
                                  Bio1Traits ,
                                  by = c("acceptedScientificName", "Taxon"))%>%
  na.omit()%>%
  select(-c(acceptedScientificName, Taxon, ID, Longitude))

# Climatic variables with randomised covariates

climaticVarsExtractedRandomized <- cbind(BackgroundBioClim, 
                                         RandomSampleBioClimTraits)%>%
  na.omit()%>%
  as.data.frame() %>%
  st_as_sf(., coords = c('Longitude', 'Lat'), crs = projCRS)#%>%
  #rbind(., climaticVarsExtracted)

# COnvert the sf object to a raster
# climaticVarsExtracted <- climaticVarsExtracted%>%
#   st_as_sf()%>%
#   rast()

# define region to download as bounding box of buffered and projected mesh/regionGeometry
regionGeometryBuffer <-st_union(if(exists("meshForProject")) {
  meshForProject |>
    inlaMeshToSf()
}
else regionGeometry) |>
  st_buffer(200) |>
  st_transform(projCRS) |> 
  st_bbox() |> 
  st_as_sfc() |>
  st_segmentize(dfMaxLength = 100) |> 
  vect() 

# define working project raster 
baseRaster <- terra::rast(extent = ext(regionGeometryBuffer), 
                          res = c(1, 1), 
                          crs = projCRS)

# rasterise regionGeometry
regionGeometryRast <- regionGeometry |>
  st_as_sf() |>
  st_transform(projCRS) |> 
  vect() |>
  terra::rasterize(baseRaster, FUN = "mode") 

#Write a funtion to perform the rasterisation
# rasterFunction <- function (x) terra::rasterize(climaticVarsExtracted, 
#                                                 regionGeometryRast, 
#                                                 x, 
#                                                 FUN = "sum",
#                                                 background = NA)

myMax <- function(x){
  max(x, na.rm = TRUE)
}

# Creating the environmental covariates we need to fit the models
# Trying to create the them as the same resolution with the bioclim raster
# climaticVarsExtractedWithoutBioclimVars <- climaticVarsExtractedNew
 #climaticVarsExtractedWithoutBioclimVarsRandomized <- climaticVarsExtractedRandomized#[-c(1:3)]

# Create a function to rasterise the covariates
 rasterFunction <- function (i, sfObject){
   terra::rasterize(x = st_coordinates(sfObject), 
                    #y = regionGeometryRast, 
                    y = BioClimNorway,
                    values = c( sfObject[[i]]), 
                    FUN = "mean")#,
                    #update = TRUE, cover = TRUE)
 }
 
 # Create a function to rasterise the covariates by filling the missing values
 # mbio1 and mLats with its sd counterparts have missing values for the raster 
 # created for the model in pointedSDMs.
 # To deal with this, we use the inlabru's missing value function to fill the 
 # missing values
 rasterFunctionWithMissingValues <- function(oldRaster){
   # create an sf object with the coordinates of the bioclim vars
   sfObject <- st_as_sf(as.data.frame(crds(BioClimsForSpeciesRegions)),
                        coords = c("x", "y"),
                        crs = projCRS
   )
   
   # use the coordinates and the oldRaster with missing values
   # to get the new raster
  fillingMissingValues <- covariateMissingValues(sfObject, 1,  "last", oldRaster)
   
  # Create a rester with the new values
   terra::rasterize(x = st_coordinates(sfObject), 
                    y = BioClimsForSpeciesRegions,
                    values = c(fillingMissingValues), 
                    FUN = "mean")
   
 }
 
 # library(tidyverse)
 # library(sf)
 # library(stars)

 #rt <- st_rasterize(climaticVarsExtracted %>% dplyr::select(Latitude, geometry))

 # Create rasters for environmental covariates 
foo_sf <- list()
for(i in 1:(length(climaticVarsExtractedNew)-1)){
  x <- names(climaticVarsExtractedNew)[i]
  oldRaster <- rasterFunction(x, climaticVarsExtractedNew)
  foo_sf[[i]] <- rasterFunctionWithMissingValues(oldRaster)
}



# Create rasters for randomised covariates
foo_sf_random <- list()
for(i in 1:(length(climaticVarsExtractedRandomized)-1)){
  x <- names(climaticVarsExtractedRandomized)[i]
  oldRaster <- rasterFunction(x, climaticVarsExtractedRandomized)
  foo_sf_random[[i]] <-  rasterFunctionWithMissingValues(oldRaster)
}

# Add the other BioClim variables to the new raster
fooBioclim_sf <- fooBioclim_sf_randomized <- list()
namesOfBioClimVars <- c("bio1", "CellID", "Latitude")
for(i in 1:length(namesOfBioClimVars)){
  fooBioclim_sf[[i]] <- BioClimsForSpeciesRegions[[i]]
  fooBioclim_sf_randomized[[i]] <- BioClimsForSpeciesRegions[[i]]
}


# Now we load land corine cover and habitat heterogeneity
if(!file.exists(paste0(dataFolder, "/Rasters/landCoverClassification.tiff"))){
# Note that the function corineReclassification will require you
# to open the ZIP file with the land corine dataset.
# For this work, the file is located at the Data/50599
  landCover <- corineReclassification()%>%
  terra::project(., projCRS)

 # For some reason, NaNs are added as factors in the model,
  # so I set that as water bodies since 
  levels(landCover)[[1]][,2][is.na(levels(landCover)[[1]][,2])] <- "Water bodies"
  levels(landCover)[[1]][,2][28] <- "Moors and heathland"
  #values(landCover)[,1][is.nan(values(landCover)[,1])] <- 48
  
# save the land cover covariate
# It takes a long time to finish the projection
writeRaster(landCover, 
            paste0(dataFolder, "/Rasters/landCoverClassification.tiff"), 
            overwrite=TRUE)
} else {
  # Load the already formatted copy
  landCover <- rast(paste0(dataFolder, "/Rasters/landCoverClassification.tiff"))
}

soilPh <- rast(paste0(dataFolder, "/Rasters/soil_ph.tiff"))%>%
  terra::project(., projCRS)

soilCoarseFraction <- rast(paste0(dataFolder, "/Rasters/soil_coarse_fraction.tiff"))%>%
  terra::project(., projCRS)

soilMoisture <- rast(paste0(dataFolder, "/Rasters/soil_moisture.tiff"))%>%
  terra::project(., projCRS)

soilOrganicCarbon <- rast(paste0(dataFolder, "/Rasters/soil_organic_carbon.tiff"))%>%
  terra::project(., projCRS)

# Put all the lists together
allFoo_sf <- c(fooBioclim_sf,
               foo_sf[-c(1:3)],
               landCover,
               soilPh,
               soilCoarseFraction,
               soilOrganicCarbon,
               soilMoisture)

allFoo_sf_random <- c(fooBioclim_sf_randomized,
                      foo_sf_random[-c(1:3)],
                      landCover,
                      soilPh,
                      soilCoarseFraction,
                      soilOrganicCarbon,
                      soilMoisture)

# set the names to use for the raster
namesToUse <- c(#namesOfBioClimVars, 
                names(climaticVarsExtractedNew)[-length(names(climaticVarsExtractedNew))],
                "landCover",
                "soilPh",
                "soilCoarseFraction",
                "soilOrganicCarbon",
                "soilMoisture")


# Put all the parameters together
#z <- 0
parametersCropped <- allFoo_sf |> 
  lapply(function(x) {
    #print(z +1)
    #print(is.raster(x))
    regionExt <- as.polygons(terra::project(regionGeometryBuffer, x), extent = TRUE)
    # Crop each covariate to extent of regionGeometryBuffer
    out <- terra::crop(x, regionExt, snap = "near")
    # Project all rasters to baseRaster and combine
    
    
    if(is.factor(x)) {
      if("Water bodies" %in% levels(out)[[1]][,2]){
      values(out)[,1][is.nan(values(out)[,1])] <- 48
      levels(out) <- levels(x)
      }
      # project categorical rasters
      out <- terra::project(out, baseRaster, method = "mode")
      levels(out) <- levels(x)  # reassign levels 
      out
    } else {
      # project & scale continuous rasters
      ifel(is.na(regionGeometryRast), NA,
           terra::project(out, baseRaster)) |>
        scale()
    }
    }) |>  
  rast() |>  # combine raster layers
  #scale() |>  # scale raster layers
  setNames(namesToUse)

parametersCroppedRandomized <- allFoo_sf_random  |> 
  lapply(function(x) {
    #print(z +1)
    #print(is.raster(x))
    regionExt <- as.polygons(terra::project(regionGeometryBuffer, x), extent = TRUE)
    # Crop each covariate to extent of regionGeometryBuffer
    out <- terra::crop(x, regionExt, snap = "near")
    # Project all rasters to baseRaster and combine
    
    
    if(is.factor(x)) {
      # I make sure that NaNs are not part of the land Cover values
      if("Water bodies" %in% levels(out)[[1]][,2]){
        values(out)[,1][is.nan(values(out)[,1])] <- 48
        levels(out) <- levels(x)
      }
      # project categorical rasters
      out <- terra::project(out, baseRaster, method = "mode")
      levels(out) <- levels(x)  # reassign levels 
      out
    } else {
      # project & scale continuous rasters
      ifel(is.na(regionGeometryRast), NA,
           terra::project(out, baseRaster)) |>
        scale()
    }
  }) |>  
  rast() |>  # combine raster layers
  #scale() |>  # scale raster layers
  setNames(namesToUse)


#plot(parametersCropped)
#plot(parametersCroppedRandomized)

# Define the covariates needed
#r <- disagg(parametersCropped, fact = 30, method = "bilinear")
#parameters <- parametersCropped
# The prediction data is in a bounded box, and for landCover, we have values within
# the entire bounded box. We need to mask the covariates by the regionGeometry
parameters <- parametersCropped

parameters <- regionGeometry %>%
  st_transform(., newCrs)%>%
  vect( )%>%
  mask(parameters, .)

# define geometries to combine with prediction 
geometries <- xyFromCell(parameters , seq(ncell(parameters))) %>% 
  as.data.frame() %>% 
  st_as_sf(coords = c("x", "y"), crs = newCrs) 

pars <- parameters %>% 
  as.data.frame(na.rm = FALSE)%>% 
  select(bio1)%>%
  bind_cols(geometries)%>% # transform the prediction data to the units used in model runs
  filter(rowSums(is.na(.)) != (ncol(.)-1 ))

PolyBio1 <- poly(pars$bio1, 2)[,1]
PolyBio2 <- poly(pars$bio1, 2)[,2]

pars <- pars %>%
  dplyr::mutate(polyBio1 = PolyBio1,
                polyBio2 = PolyBio2) %>%
  st_sf()%>%
  st_transform(., newCrs)

spPred <- rasterize(pars, baseRaster, names(pars)[-length(pars)])
#parametersCropped
parameters$polyBio1 <- spPred$polyBio1
parameters$polyBio2 <- spPred$polyBio2
parameters$polyBio1Mbio <- spPred$polyBio1 * parameters$mBio1
parameters$polyBio2Mbio <- spPred$polyBio2 * parameters$mBio1
 parameters$bio1sq <- (parameters$bio1)^2
 parameters$bmInt <- parameters$bio1 * (parameters$mBio1)
 parameters$bmIntsq <- (parameters$bio1)^2 * parameters$mBio1
 parameters$sdmInt <-parameters$sdBio1 * parameters$mBio1
 parameters$mlatInt <- parameters$Latitude * parameters$mLat


environmentalCovariates <- parameters 

# save the covariates
writeRaster(environmentalCovariates, 
            paste0(dataFolder, "/Rasters/environmentalCovariates.tiff"), 
            overwrite=TRUE)


# Calculate the quadratic for the randomized covariates
parameters <- parametersCroppedRandomized
#parametersCropped
parameters$polyBio1 <- spPred$polyBio1
parameters$polyBio2 <- spPred$polyBio2
parameters$polyBio1Mbio <- spPred$polyBio1 * parameters$mBio1
parameters$polyBio2Mbio <- spPred$polyBio2 * parameters$mBio1
 parameters$bio1sq <- (parameters$bio1)^2
 parameters$bmInt <- parameters$bio1 * (parameters$mBio1)
 parameters$bmIntsq <- (parameters$bio1)^2 * parameters$mBio1
 parameters$sdmInt <- parameters$sdBio1 * parameters$mBio1
 parameters$mlatInt <- parameters$Latitude * parameters$mLat
environmentalCovariatesRandomized <- parameters 
# save the covariates
writeRaster(environmentalCovariatesRandomized, 
            paste0(dataFolder, "/Rasters/environmentalCovariatesRandomized.tiff"), 
            overwrite=TRUE)

} else {
  environmentalCovariates   <- rast(paste0(dataFolder, "/Rasters/environmentalCovariates.tiff"))
  environmentalCovariatesRandomized <- rast(paste0(dataFolder, "/Rasters/environmentalCovariatesRandomized.tiff"))
}

We used several covariates to fit both the pollinator and plant ISDMs. The names of the covariates, their description and source are presented in Table 8. Additionally, we include the indication of whether the covariate is a habitat, trait or climatic variables as well as which of the ISDMs it was used to fit in Table 8. The rasters of the covariates are on \(1\) km resolution. For the annual temperature, we create a raster of the orthogonal polynomial of degree \(2\) (using the poly function in the stats R-package) and use them to fit the pollinator distribution instead of the original annual temperature values. The soil covariates were obtained from Poggio et al. (2021).

Table 8: Description of covariates used to fit the integrated species distribution models and the source the covariates were retrieved from. The table indicated what type of covariate (habitat, trait and climatic) and which of the distributions it was used for (pollinator and/or plant) ISDM
Name Description Source Habitat/ Trait/ Climatic Disribution
bio1 Annual mean temperature geodata R-package Climatic Plant
polyBio1 First column from orthogonal polynomial of annual mean temperature to degree of 2 Calculated from bio1 raster Climatic Pollinator
polyBio2 Second column from orthogonal polynomial of annual mean temperature to degree of 2 Calculated from bio1 raster Climatic Pollinator
Latitude Latitudinal gradient extracted from the bio1 raster Extracted from bio1 raster Trait Pollinator
mBio1 Mean pollinator annual temperature per each pollinator species Calculated from bio1 raster and data downloaded from GBIF Trait Pollinator
sdBio1 Standard deviation of pollinator annual temperature per each pollinator species Calculated from bio1 raster and data downloaded from GBIF Trait Pollinator
mLat Mean pollinator latitude per each pollinator species Calculated from latitude raster and data downloaded from GBIF Trait Pollinator
sdmInt Interraction between sdBio1 and mBio1 NA NA Pollinator
poly1bmInt Interraction between mBio1 and polyBio1 NA NA Pollinator
poly2bmInt Interraction between mBio1 and polyBio2 NA NA Pollinator
landCover Land cover raster Land corine data Habitat Pollinator, Plant
soilPh Ph of soil Poggio et al. (2021) Habitat Plant
soilCoarseFraction Soil coarse fraction Poggio et al. (2021) Habitat Plant
soilOrganicCarbon Soil orgainic carbon Poggio et al. (2021) Habitat Plant
soilMoisture Soil moisture Poggio et al. (2021) Habitat Plant

We present the distribution of the covariates described in Table 8 here in Figure 11.

Figure 11: Covariates used to fit the ISDM. The resolution of the covariates are on a 1 km resolution.

To ascertain the trait effect on the pollinator distribution, we randomise the triat covariates in Table 8. These randomised covariates (displayed in Figure 12) are used to fit ISDMs, and compared with the ISDMs fitted with covariates in Figure 11 .

Figure 12: Randomised trait covariates used for the cross-validation to assess the trait effect on the distribution of pollinators in the study region. The resolution of the covariates are on a 1 km resolution.

8. Spatial units

We describe the spatial units of our analysis based on the UN System of Environmental Accounting (SEEA EA; see Rahayu et al. (2024) and structure of ecosystem accounting). The Norwegian territory defines our ecosystem accounting area (EEA) and the vegetation zones (described by feltkurskompendium biologisk mangfold terrestrisk biologi and store norske leksikon) define the ecosystem types and our ecosystem assets (EA) are defined by the geo-climatic regions (described in Section 1.4.1). We also use a gridded EA (500 x 500 m) in describing our basic spatial units (BSU).

9. Analyses

We fit an ISDM using the point process framework (Illian et al. 2008) to our processed datasets described in Section 1.7. ISDMs are various observation models that are linked together by a shared ecological process (Isaac et al. 2020; Dorazio 2014; Fithian et al. 2015).

9.1. General overview of ISDMs

In this subsection, we present an overview of the ISDMs for general understanding of this document. For further details on ISDM, refer to Isaac et al. (2020), Dorazio (2014) and Fithian et al. (2015). The models are described using the point process framework as described in Adjei et al. (2023). Additionally, we present the description of these models under the assumption that we are fitting a multi-species ISDM using presence-only and presence-absence data.

In this subsection, we describe the framework by defining a “multi-group” model, where the group can be on any taxonomic level such as species, genus, order, etc. In Section 1.9.2 and Section 1.9.3, we will define what the group represent in the ISDMs fitted there.

We model the presence-only data as a Poisson point process model (Illian et al. 2008) with mean intensity \(\lambda_i(s)\) for each group \(i = 1, \ldots, S\), where \(S\) is the number of groups. This mean intensity modelled as: \[ \ln(\lambda_i(s)) = \beta_{0i} + \beta_{ji} X_j(s) + \omega_i(s) + \eta(s), \tag{1}\] where \(\beta_{0i}\) is the group-specific intercept, \(\beta_{ji}\) is the group-specific effect of covariate \(j\), \(X_j(s)\) is the \(j^{th}\) covariate field, \(\omega_i(s)\) is the group-specific spatial autocorrelation field and \(\eta(s)\) is the bias field. \(\omega_i(s)\) and \(\eta(s)\) are modeled as zero mean Gaussian random field (i.e. \(\omega_i(s) \sim N(0, \Sigma)\), where \(\Sigma\) is a Mat’ern covariance matrix with variance \(\sigma_{i}^2\) and range \(\kappa_i\) and \(\eta(s) \sim N(0, \Sigma_\eta)\), where \(\Sigma_\eta\) is a Mat’ern covariance matrix with variance \(\tau^2\) and range \(\kappa_\eta\)). Unless otherwise stated, we will assume that all the \(S\) groups share the same parameters in the covariance matrix (\(\Sigma\)).

We model the presence-absence data with a logistic regression model. Let \(y_i(s)\) be a binary value at location \(s\) with \(0\) indicating absence of group \(i\) and \(1\) indicating presence of group \(i\). Then \(y_i(s) \sim \text{Bernoulli}(\Psi_i(s))\) with: \[ cloglog(\Psi_i(s)) = \beta_{0i} + \beta_{ji} X_j(s) + \omega_i(s) \tag{2}\] where the parameters \(\beta_0\), \(\beta_1\) and \(\omega\) are defined in Equation 1. We are assuming in Equation 2 that the presence-absence data is not biased.

All the ISDMs were fitted using the R-package PointedSDMs (Mostert and O’Hara 2023). PointedSDMs is a wrapper around inlabru (Bachl et al. 2019), which used the integrated nested Laplace approximation method (implemented in the R-package INLA; Rue, Martino, and Chopin (2009)) to fit hierarchical models that can be expressed as linear Gaussian models. The reader is referred to Zuur (2017), Gómez-Rubio (2020), Blangiardo and Cameletti (2015) for further reading.

To be able to fit the model, the study region is discretised by creating a mesh over the surface. The mesh we used in this study is presented in Figure 13.

Figure 13: Mesh used in fitting the models with INLA.

9.2. ISDM for pollinators

Using the model framework defined in Section 1.9.1, we fit an ISDM to the pollinator dataset obtained from GBIF using the order: Bees, Butterflies and Hoverflies (use their right name here) as the group. We use latitude, annual mean temperature (bio1), trait mean temperature (mBio1), trait standard deviation temperature (sdBio1), trait mean Latitude (mLat), traut standard deviation of latitude (sdLat), annual mean temperature squared and interaction between annual temperature and trait mean temperature (bmInt) as covariates.

Code
# Load data and covariates
source("pipeline/dataImport/importPollinatorFromGBIF.R")
source("pipeline/dataImport/environmentalDataImport.R")

# Set the model
model <- PointedSDMs::startSpecies(PointsBeesAndButterfliesAndHoverflies, # list of pollinator dataset (containing both mergedDatasetPA and mergedDatasetPO) 
                                   Boundary = regionGeometry, # boundary of the study
                                   Projection = newCrs, # projection
                                   Mesh = meshForProject, #mesh used for the model
                                  speciesEnvironment = TRUE, # Should we have pollinator specific covariate effect
                                  speciesIntercept = TRUE, # TRUE treats the intercept as a random effect, instead of constrained as with a fixed effect 
                                  pointsIntercept = FALSE, #should we include intercept for each dataset
                                  pointCovariates = FALSE, #do we have covariates for the presence-only data
                                   responsePA = 'individualCount', # column name of the response values for presence-absence data
                                  # trialsPA = 'trials',
                                   spatialCovariates = envCovsForModel, # raster of spatia covariates
                                   speciesName = 'Taxon', # Names of the species in the datasets
                                   pointsSpatial = NULL, # Do not include a dataset spatial field
                                   speciesSpatial = "replicate")   # unique spatial field per species, but they share the same hyperparameters.

# Add second bias field for PO data
model$addBias(datasetNames = 'mergedDatasetPO')

9.2.1 Priors

We assume the following priors for the pollinator ISDM:

  • The probability that the standard deviation of the pollinator spatial field is greater than \(0.5\) is \(0.1\) (i.e. \(P(\sigma_\omega > 0.5) = 0.01\)).

  • The probability that the standard deviation of the bias field is greater than \(1\) is \(0.01\) (i.e. \(P(\sigma_\eta > 1) = 0.01\))

  • The probability that the spatial range of the pollinator spatial field is less than \(1\)km is \(0.01\) (i.e \(P(\kappa_\omega < 1) = 0.01\))

  • The probability that the spatial range of the pollinator spatial field is less than \(1\)km is \(0.1\) (i.e. \(P(\kappa_\eta < 1) = 0.1\))

  • effects of continuous covariates (all covariates except land cover) and the intercept for each pollinator is assumed to be from a Normal distribution with mean \(0\) and precision of \(0.01\) and Normal distribution with mean \(0\) and precision of \(2\) respectively.

Code
# hyper parameters of the spatial field (shared across species)
for(taxon in c("Bees", "Butterflies", "Hoverflies")){
  # hyper parameters of the spatial field (shared across species)
  model$specifySpatial(Species = taxon,  # define same prior for the all species
                       prior.sigma = c(0.5, 0.01),  # SD of field variation; 
                       prior.range = c(1, 0.01),  # range of spatial correlation; 
                       constr = FALSE
  )
  
  # PC prior for intercepts
  model$priorsFixed(Effect = "intercept",  # define same prior for the all species
                    Species = taxon,
                    mean.linear = 0,
                    prec.linear = 2
  )
  
  model$changeComponents(paste0("", taxon, "_bio.sq(main = ", taxon, "_bio.sq, model = ", '"', "clinear", '"', ", range = c(-10, 0))"))
  
  model$changeComponents(paste0("", taxon, "_landCover(main = ", taxon, "_landCover, model = ", '"', "factor_contrast", '"', ", hyper = list(theta = list(initial = 0, fixed = TRUE)))"))
  
}

# Prior for bias field
model$specifySpatial(Bias = 'mergedDatasetPO', # Change the prior
                     prior.sigma = c(1, 0.01),
                     prior.range = c(1, 0.01))

# prior for random effects (mesh nodes of spatial field and species intercepts)
model$specifyRandom(
  # precision parameter on how much each species' spatial field (how much they can deviate from the shared ___)
  speciesGroup = list(model = "iid", 
                      hyper = list(prec = list(initial = 0, fixed = TRUE))), # InitialValue of sigma^2 = 1/exp(1) = 0.3
  # precision parameter on the baseline species occurrence rate
  speciesIntercepts = list(initial = 0,
                           fixed = TRUE),
  
  copyBias =  list(model = "iid", 
                   hyper = list(prec = list(initial = 0, fixed = TRUE)))#list(beta = list(initial = 0, fixed = TRUE))
)  

model$changeComponents(paste0("mergedDatasetPO_biasField(main = geometry, model = mergedDatasetPO_bias_field,control.group = list(model =", '"', "iid", '"' , ", hyper = list(prec = list(initial = 0, fixed = TRUE))))"))

# Specify priors for covariate effects (continuous)
covariatesToAddEffects <- c("bio1", "bmInt", "RegCom", "mBio1", "sdBio1", "sdmInt", "Latitude", "mLat", "sdLat", "mlatInt")
#covariatesToAddEffects <- c("bio1","Latitude")
for(covs in covariatesToAddEffects){
  model$priorsFixed(Effect = covs,
                    mean.linear = 0,
                    prec.linear = 0.01)
}

9.2.2 Fit model and make predictions

We fit the model by using the Gaussian strategy of the empirical Bayes integration strategy in R-package INLA (Rue, Martino, and Chopin 2009). Note that the model fit returns effect values for each pollinator taxon.

Code
modelOptions <- list(num.threads = 4,
                     control.inla = list(int.strategy = 'eb', 
                                         diagonal = 0.001, 
                                         cmin = 0, 
                                         strategy = "gaussian",
                                         control.vb= list(enable = FALSE)), 
                     verbose = FALSE, 
                     safe = TRUE, 
                     inla.mode = "experimental")

modelRun <- PointedSDMs::fitISDM(data = model, 
                            options = modelOptions)
  
  # predictions for this model
  individualDatasetPreds <- predict(modelRun,
          data = ppxl,
          predictor = TRUE,
          n.samples = 100)

Next, we cluster all the over \(8000\) pollinator species into groups using a clustering algorithm. We explored using k-means and hierarchical clustering method. K-means clustering provides a simple approach for partitioning a data set into \(K\) distinct non-overlapping clusters such that the within-cluster-variation is small as possible (Hastie, Tibshirani, and Friedman 2009). We need to choose the number of cluster and the number of starts for the algorithm, which can have an impact on the results. The cluster is defined by the trait annual temperature (mBio1) and latitude (mLat) displayed in Figure 11. We assess what the optimal choice of the number of clusters is by varying the number of clusters from \(1\) to \(10\) and choosing the cluster with the smallest within-cluster- variation. The optimal choice of the number of clusters is \(3\). Hence, we chose \(K = 3\) and \(n.start = 20\), after we checked the possible number of clusters that yields the smallest within-cluster-variation.

Alternatively, hierarchical clustering helps us explore the choice of \(K\) clusters through a dendogram. We then cut the tree at \(k = 3\). We present the three clusters from the two clustering algorithms in Figure 14.

Figure 14: The three clusters from the A)k-means and B) hierarchical clustering algorithm based on the trait latitude (mLat) and temperature (mBio1) values. The mean values of each clister is represented by the black quare within each cluster. These mean values of mLat and mBio1 are used to make predictions for each for each cluster.
Code
if(!file.exists(paste0(dataFolder, "/pollinatorDataFolder/bioTraits.csv"))){
# Calculate the mean and SD of the bio1
speciesDataImported <- readRDS(paste0(dataFolder, "/pollinatorDataFolder/speciesDataImported.RDS"))
speciesData <- speciesDataImported[["species"]]%>%
  do.call("rbind", .)%>%
  sf::st_as_sf(., coords = c('X', 'Y'), crs = newCrs)%>%
  dplyr::select(acceptedScientificName, Taxon)

# Extract latitude and annual temperature
climaticVarsExtracted <- terra::extract(environmentalCovariates,
               speciesData)%>%
  dplyr::select(Latitude, bio1, landCover)%>%
  cbind(., st_coordinates(st_as_sf(speciesData)))%>%
  st_as_sf(., coords = c('X', 'Y'), crs = newCrs)%>%
  st_join(st_as_sf(speciesData), .,join = st_nearest_feature, left = TRUE)

# Estimate trait values
Bio1Traits <- aggregate(bio1~acceptedScientificName+Taxon,
                        climaticVarsExtracted, 
                        mean)
names(Bio1Traits)[3] <- "mBio1"

Bio1Traits$sdBio1 <- aggregate(bio1~acceptedScientificName+Taxon,
                               climaticVarsExtracted,
                               sd)[,3]

Bio1Traits$mLat <- aggregate(Latitude ~ acceptedScientificName+Taxon,
                             climaticVarsExtracted,
                             mean)[,3]

Bio1Traits$sdLat <- aggregate(Latitude ~ acceptedScientificName+Taxon,
                              climaticVarsExtracted,
                              sd)[,3]

#######################
# Cluster these data
#####################

n_clusters <- 10

# Initialize total within sum of squares error: wss
wss <- numeric(n_clusters)

set.seed(123)

n <- 10
# Look over 1 to n possible clusters
for (i in 1:n) {
  print(i)
  # Fit the model: km.out
  km.out <- kmeans(Bio1Traits[, c("mLat", "mBio1")], centers = i, nstart = 20)
  # Save the within cluster sum of squares
  wss[i] <- km.out$tot.withinss
}

# Produce a scree plot
wss_df <- tibble(clusters = 1:n, wss = wss)

scree_plot <- ggplot(wss_df, aes(x = clusters, y = wss, group = 1)) +
  geom_point(size = 4)+
  geom_line() +
  scale_x_continuous(breaks = c(2, 4, 6, 8, 10)) +
  xlab('Number of clusters')+
  geom_hline(
    yintercept = wss, 
    linetype = 'dashed', 
    col = c(rep('#000000',2),'#FF0000', rep('#000000', 7))
  )
plot(scree_plot)

# Fi the final model with 3 clusters
km.out <- kmeans(Bio1Traits[, c("mLat", "mBio1")], centers = 3, nstart = 1)
Bio1Traits$clusterID <- factor(km.out$cluster)

# Set the values of whether the species is temperate, alpine or boreal
Bio1Traits <- Bio1Traits %>%
  dplyr::mutate(zone = ifelse(clusterID ==1 , "alpine",
                              ifelse(clusterID == 2, "temperate", "boreal")))

# Merge these estimates with the climatic variables extracted
biotraits <- merge(climaticVarsExtracted,
                                  Bio1Traits ,
                                  by = c("acceptedScientificName", "Taxon"))%>%
  na.omit()

#save the results
write.csv(biotraits, file = paste0(dataFolder, "/pollinatorDataFolder/bioTraits.csv"))

#plot(biotraits$mBio1, biotraits$bio1)

} else {
  biotraits <- readr::read_csv("Data/pollinatorDataFolder/bioTraits.csv")
}

The aim is to assign the same predicted intensity (\(\lambda\)) to each species within the cluster. The means for each cluster is presented in Table 9. We name these three clusters as: 1 - temperate, 2 - boreal and 3 - alpine zones.

Table 9: Cluster means for trait temperature and latitude from the k-means clustering algorithm. The zones are defined as alpine, boreal and temperate zones.
Zones Annual temperature Latitude
Alpine -0.359 0.994
Temperate 1.599 -1.040
Boreal 0.716 -0.449

We predict the intensity of pollinators within each zone, by assigning a fixed mBio1 and mLat value (with the value from Table 9 and properly estimating the interaction values as well) whilst keeping the other covariate rasters (see Figure 11) the same. Thus, we obtain nine (\(9\)) predictions maps foe each combination of taxon and zone. Each pollinator species within the combination of taxon and zone (Table 10) are assigned their respective predicted intensity .

Table 10: A sample of the pollinator species, pollinator taxon and the cluster zones.
Pollinator species Taxon Cluster zones
Gillmeria pallidactyla (Haworth, 1811) Butterflies boreal
Dioryctria abietella (Denis & Schiffermüller), 1775 Butterflies alpine
Coleophora vacciniella Herrich-Schäffer, 1861 Butterflies boreal
Ypsolopha dentella (J.C.Fabricius, 1775) Butterflies alpine
Mycetophila strigatoides (Landrock, 1927) Hoverflies boreal
Diptera Hoverflies boreal
Clossiana euphrosyne (Linnaeus, 1758) Butterflies boreal
Psychoda phalaenoides (Linnaeus, 1758) Hoverflies boreal
Bolitophila bimaculata Zetterstedt, 1838 Hoverflies boreal
Aclista bitensis (Kieffer, 1909) Bees boreal
Platycheirus ramsarensis Goeldlin de Tiefenau, Maibach & Speight, 1990 Hoverflies boreal
Agriphila straminella (Denis & Schiffermüller), 1775 Butterflies alpine
Platypalpus agilis (Meigen, 1822) Hoverflies alpine
Amphipoea Billberg, 1820 Butterflies alpine
Noctua pronuba Linnaeus, 1758 Butterflies alpine

9.2.3 Pollinator ISDM validation

Additionally, we perform validation to assess the trait effect on the pollinator distribution across the study region (Mostert and O’Hara 2023). We re-fit the ISDM in Section 1.9.2 using the randomised trait covariate (Figure 12. We compare the deviance information criterion (DIC) of the two models (the one fitted with the true covariates in Section 1.9.2 and the re-fitted model described here). The model with the lowest DIC score is deemed to be the best. If the model with the randomised covariates is chosen as the best, we can conclude that the trait covariates are not significant. If the model fitted in Section 1.9.2 is deemed best, we can conclude that the trait covariates are significant.

9.3. ISDM for plant species

There are \(54\) plant species we aim to estimate their species distribution. Looking at the number of occurrence records in Table 7, developing a multi-species ISDM will produce stable estimates for the species with fewer occurrence records like Hieracium umbellatum. However, we do not have the required memory allocations to fit all the \(54\) species together, so we split the total number of species into six groups, with nine species in each group. The splitting is done by sorting the total occurrence of each species in the ASO data and assigning the species to each group so that the highest occurrence is in the first group, the second is in the second group and the list continues.

Code
# PointedSDMs takes a longer time to fit for many species
# The trick to to break it into smaller groups
# The split will be done by the number of present speccies in each location
# I will split it nGroups time
sortedSpecies <- table(asoDatasf$acceptedScientificName)%>%
  as.data.frame()%>%
  dplyr::arrange(desc(Freq))%>%
  select(Var1)%>%
  c()

# Set the number of groups
nGroups <- 10

#split the species into groups of 10 species in each
plantSpeciesGroup <- split(sortedSpecies$Var1, 
                           seq(1, 
                               ceiling(length(sortedSpecies$Var1)/nGroups)))

After splitting the data into the groups, we fit a multi-species ISDMs (described in Section 1.9.1) to each group.

Similar to the model the pollinator ISDM, we fitted an ISDM to the \(54\) plant species. Here, we use species as groups and soil ph, soil organic carbon, soil coarse fraction, land cover, soil moisture and annual temperature as covariates.

Code
speciesModelShared <- PointedSDMs::startSpecies(formatPlantData,
                                                Boundary = regionGeometry, 
                                   Projection = newCrs, 
                                   Mesh = meshForProject,
                                   responsePA = 'individualCount',
                                   speciesEnvironment = TRUE,
                                   speciesIntercept = TRUE,
                                   spatialCovariates = envCovsForPlantSpeciesModel, 
                                   speciesName = 'simpleScientificName',
                                   pointsIntercept = FALSE,
                                   pointsSpatial = NULL, # Do not include a dataset spatial field
                                   speciesSpatial = "replicate")  

# Add bias spatial field for the PO dataset
speciesModelShared$addBias(datasetNames = 'mergedPlantsPO')

9.3.1 Priors

We assume the following priors for the plant species ISDM:

  • The probability that the standard deviation of the plant species spatial field is greater than \(0.1\) is \(0.1\) (i.e. \(P(\sigma_\omega > 0.1) = 0.1\)).

  • The probability that the standard deviation of the bias field is greater than \(0.1\) is \(0.1\) (i.e. \(P(\sigma_\eta > 0.1) = 0.1\))

  • The probability that the spatial range of the plant species spatial field is greater than \(50\) is \(0.1\) (i.e \(P(\kappa_\omega < 50) = 0.1\))

  • The probability that the spatial range of the plant species spatial field is greater than \(50\) is \(0.1\) (i.e. \(P(\kappa_\eta < 50) = 0.1\))

Code
# hyper parameters of the spatial field (shared across species)
speciesModelShared$specifySpatial(Species = TRUE,  # define same prior for the all species
                     prior.sigma = c(1, 0.1),  
                     prior.range = c(5, 0.1))  


speciesModelShared$specifySpatial(Bias = TRUE, # Change the prior
                     prior.sigma = c(0.6, 0.1),
                     prior.range = c(5, 0.1))

# prior for random effects (mesh nodes of spatial field and species intercepts)
speciesModelShared$specifyRandom(
  # precision parameter on how much each species' spatial field (how much they can deviate from the shared ___)
  speciesGroup = list(model = "iid", 
                      hyper = list(prec = list(prior = "pc.prec",
                                               param = c(0.1, 0.1)))),  
  # precision parameter on the baseline species occurrence rate
  speciesIntercepts = list(prior = 'pc.prec',
                           param = c(0.1, 0.1)))  

9.3.2 Model fitting and predictions

We fit the model by using the adaptive strategy of the composite design integration strategy in INLA. We predict the occurrence probability for each plant species by transforming the plant species with the inverse logit function.

Code
# specify the model options for INLA
modelOptions <- list(num.threads = 4,
                     control.inla = list(int.strategy = 'ccd', 
                                         diagonal = 0.001, 
                                         cmin = 0, 
                                         strategy = "adaptive",
                                         control.vb= list(enable = FALSE)), 
                     verbose = FALSE, 
                     safe = TRUE, 
                     inla.mode = "experimental")

# Species-specific spatial effects model
speciesSharedEst <- PointedSDMs::fitISDM(data = speciesModelShared, 
                            options = modelOptions)

# I proceed with the prediction of occupancy probabilities
# over the entire region
individualDatasetPreds <- predict(speciesSharedEst,
                                  data = ppxl,
                                  predictor = TRUE,
                                  n.samples = 500)

9.4 Species Interractions

We visualize the bipartite networks between the pollinator taxon and the plant species using data from Rasmussen et al. (2021) for hoverflies, Museum (2023a) and Museum (2023b) for butterflies and Robinson et al. (2023) for bees.

Code
# Interraction web
# Import and format the pollinator datasets
if(!file.exists(paste0(dataFolder, "/interractionsData/interractionMatrix.csv"))){
FinalSelectionOfLepidopteraAndBeesAndHoverflies <- allPollinatorsDataset(dataFolder, 
                                                                         print = FALSE)
Bio1Traits <- read.csv(paste0(dataFolder,"/bioTraits.csv"))
NiNIndicators <- read.csv(paste0(dataFolder, "/interractionsData/Indikatorarter NiN eng.csv"), sep = ";")
NiNIndicatorsGenera <- do.call(rbind,lapply(unique(NiNIndicators$Genus),function(x)NiNIndicators[NiNIndicators$Genus %in% x,][1,]))
FinalSelectionOfLepidopteraAndBeesAndHoverflies$PlantGenusPollinator <- with(FinalSelectionOfLepidopteraAndBeesAndHoverflies, paste(PlantGenus,ValidSpeciesName))
OneInteractionPerPlantGenusPollinator <- do.call(rbind,lapply(unique(FinalSelectionOfLepidopteraAndBeesAndHoverflies$PlantGenusPollinator),function(x)FinalSelectionOfLepidopteraAndBeesAndHoverflies[FinalSelectionOfLepidopteraAndBeesAndHoverflies$PlantGenusPollinator%in% x,][1,]))
InteractionsInIdealMeadow <- OneInteractionPerPlantGenusPollinator[OneInteractionPerPlantGenusPollinator$PlantGenus %in% NiNIndicatorsGenera$Genus,]
InteractionsInIdealMeadowWithTraits <- merge(InteractionsInIdealMeadow,
                                             Bio1Traits, 
                                             by.x= "ValidSpeciesName",
                                             by.y = "validScien")

# FInd out the species in the aso data that match up to the specific genus
if(!exists("asoDatasf")) asoDatasf <- readRDS(file = paste0(dataFolder, "/plantDataFolder/formattedData/asoData.RDS"))

# Merge the asoData genus to the species data
asoDatasf <- asoDatasf%>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    )
  ) 

interractionData <- merge(InteractionsInIdealMeadowWithTraits, 
      as.data.frame(asoDatasf[, c("simpleScientificName", "genus")])[, 1:2],
      by.x = "PlantGenus",
      by.y = "genus")

# save the interractions in ideal meadow
write.csv(InteractionsInIdealMeadowWithTraits, 
          paste0(dataFolder, "/interractionsData/InteractionsInIdealMeadowWithTraits.csv"))

write.csv(interractionData, 
          paste0(dataFolder, "/interractionsData/interractionData.csv"))

# Create the plant pollinator for network
#PlantPollinatorsForNetwork <- InteractionsInIdealMeadowWithTraits[c("ValidSpeciesName","PlantGenus")]
PlantPollinatorsForNetwork <- interractionData[c("Taxon.x","simpleScientificName")]
names(PlantPollinatorsForNetwork)[1:2] <- c("higher","lower")
PlantPollinatorsForNetwork$freq <- 1
PlantPollinatorsForNetwork$webID <- 1
dfForWeb <- PlantPollinatorsForNetwork[c("lower","higher","webID","freq")]

# Create webs from dataframe
WebReady <- bipartite::frame2webs(dfForWeb, varnames = c("lower", "higher", "webID", "freq"))

# save for later use
save(WebReady,
     file = paste0(dataFolder, "/interractionsData/webPlot.RData"))

if(modelPlots){
plotweb(WebReady[[1]])
}


# Calculte the interraction matrix
interractionMatrix <- as.data.frame(WebReady[[1]]) 
interractionMatrix$species <- rownames(interractionMatrix)
rownames(interractionMatrix) <- NULL
# save the results
write.csv(interractionMatrix, 
          paste0(dataFolder, "/interractionsData/interractionMatrix.csv"))

} else {
  load(paste0(dataFolder,"/interractionsData/webPlot.RData"))
  interractionMatrix <- readr::read_csv(paste0(dataFolder,"/interractionsData/interractionMatrix.csv"))
  
  if(modelPlots){
    plotweb(WebReady[[1]])
  }
}

# Calculate the
interractionProb <- interractionMatrix[, 2:4]%>%
  proportions(.)%>%
  dplyr::mutate(species = interractionMatrix$species)%>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(species, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    ),
    # Replace space with underscore in simpleScientificName
    species = gsub("-", "", gsub("C","", gsub(" ", "_", simpleScientificName)))
  ) 

colnames(interractionProb)[2] <- "Butterflies"

The data we got only had host plats for \(240\) out of the \(8000\) (\(3\%\)) pollinator species modeled here. To use information on the pollinator host in describing the ecosystem condition, we use aggregate the pollinator species to the order level as shown in Figure 15. Using that information, we create a network using the frame2webs function in the R-package bipartite (Dormann, Gruber, and Fruend 2008).

Code
  load("../data/webPlot.RData")
  bipartite::plotweb(WebReady[[1]])
Figure 15: Plot of pollinator taxon and their plant host networks. The pollinators are resolved to their order taxon and the plant hosts are resolved to the species level.

The values from the network for the interaction between the pollinator taxon and plant hosts are stored in a data frame such that each row represents the weight of each pollinator taxon for a given plant host. We then scale these values for each plant host such that their sum is \(1\), and use these values as weights in describing the pollinator diversity in Section 1.9.5.

9.5. Community diversity

At the habitat level, diversity is the most widely used component in describing the characteristics of a community (Thukral 2017). The alpha diverity has two components: species richness and equitability indices (Thukral 2017). A community with equal number of different species within the community will have a higher eveness index; a community dominated by one or few species in terms of the number of individuals will have a lower eveness index (Thukral 2017). A simple explanation to the diversity indices are presented here.

We use the Hill numbers, such as species richness, Shannon entropy and Simpson index, to describe the community diversity (Jost 2007, 2006; Thukral 2017). The Hill numbers are defined as (Hill 1973): \[ {}^qD = \bigg(\sum_{i = 1}^S p_i^q \bigg)^{\frac{1}{1-q}}, \tag{3}\] where \(p_i\) is the relative abundance of pollinator species \(i\) with order \(q\). The order of the diversity (\(q\)) indicates the sensitivity to common and rare species, with greater values of \(q\) (specifically \(q > 1\)) indicates a disproportional favor to common species, and lower values of \(q\) (specifically \(q < 1\)) indicates a disproportional favor to rare species. Hill numbers with order \(0\) is the species richess, Hill numbers with order \(1\) is the Shannon index and Hill numbers with order \(2\) is the Simpson index (Thukral 2017).

The relative abundance within each prediction grid cell \(s\) (\(p_i(s)\)) for Equation 3 is estimated as:

\[ \begin{split} \text{Relative abundance of pollinator species} &\propto \text{Estimated pollinator intensity} \\ &\times \text{Interraction weight} \\ & \times \text{plant species occurrence probability}\\ \implies p_i(s) &= \frac{ \frac{1}{K} \sum_k \lambda_i(s) \times w_{ik} \times \Psi_k(s)}{\sum_i \frac{1}{K} \sum_k \lambda_i(s) \times w_{ik} \times \Psi_k(s)}, \end{split} \tag{4}\] where \(w_{ik}\) is the weight of the interaction between pollinator species \(i\) and plant host \(k\), \(\Psi_k(s)\) is the occurrence probability of plant host \(k\) and \(\lambda_i(s)\) is the intensity of pollinator species \(i\).

We used the R-package vegan (Oksanen et al. 2022) to estimate the species richness, Shannon diversity and inverse Simpson index by passing the relative proportion Equation 4 into the function specnumber and diversity for species richness and both Shannon and Simpson index respectively.

Code
# Estimate diversity indices based on their taxon: either Bees, Butterflies or Hoverflies
# The function also takes into consideration whether you want to return the diversity estimate of a particular pollinator species
# If pollinatorSpeciesNames = NULL, then the function considers all the pollinator species within a zone or taxon.
# speciesForTaxon is the dataframe that contains the plant species prediction results. This is formatted from the fitted ISDMS to the ASO plant species
# zones is the zone we are interested in returning results for. It takes the values of either "all", "alpine", "boreal", "temperate" 
# predRast is the prediction raster created in the masterScript
# silent indicates whether we want to print the species names or not

estimateZonalDiversity <- function(pollinatorSpeciesNames = NULL, #Names of the pollinator species (should be the simplescientific name)
                              speciesForTaxon, # The data with the plant species results
                              zones, 
                              predRast, # The prediction raster
                              silent = FALSE){
  
  # Check if the zones specified  is what we expect
  if(!zones %in% c("all", "alpine", "boreal", "temperate" )) stop("Taxa must either be 'all', 'alpine', 'boreal', 'temperate' ")
  # Filter out the pollinator species we want
  if(!zones %in% "all"){
    dat <- speciesForTaxon%>%
      dplyr::filter(zone %in% zones)%>%
      na.omit()
  } else if (zones == "all"){
    
    dat <- speciesForTaxon%>%
      dplyr::filter(zone %in% c("alpine", "boreal", "temperate"))%>%
      na.omit()  
  }
  
  returnAllPrediction <- is.null(pollinatorSpeciesNames )
  
  if(is.null(pollinatorSpeciesNames )){
    pollinatorSpeciesNames <- unique(dat$simpleScientificName)
  } else {
    dat <- dat %>%
      dplyr::filter(simpleScientificName %in% pollinatorSpeciesNames)
  }
  
  if(!silent) print(pollinatorSpeciesNames)
  
  
  #indSpeciesDiversity <- lapply(seq_along(pollinatorSpeciesNames), function(x){
  sp <- pollinatorSpeciesNames#[x]
  
  
  #get unique taxa
  focalTaxa <- unique(dat$Taxon)
  allZones <- c("alpine", "boreal", "temperate")
  
  if(!zones %in% "all"){
    #for now, we ensure that dat has only one pollinator group
    if(length(unique(dat$zone))>1) stop("Select one zone only")
    
    # Load the intensity of the species from the zone
    spPred <- list()
    for(j in 1:length(focalTaxa)){
      path <- paste(c(dataFolder, "modelOutputs", "pollinators", zones, focalTaxa[j]), collapse = "/")
      results <- readRDS(file.path(path, paste0("predictions", ".rds")))
      results$intensity <- exp(results$mean)
      
      # Transform the intensity to the original scale
      diversity <-  lapply(plantSpecies, function(plantSp){
        divs <-  results$intensity * speciesData[[plantSp]] * interractionProb[interractionProb$species %in% plantSp, dat$Taxon[j]]
        #divs$overall <- app(divs, function(i) calculateRichness(i))
      })|>  
        rast() |>  # combine raster layers
        #scale() |>  # scale raster layers
        setNames(plantSpecies)
      
      
      # define geometries to combine with prediction 
      geometries <- xyFromCell(diversity, seq(ncell(diversity))) %>% 
        as.data.frame() %>% 
        st_as_sf(coords = c("x", "y"), crs = newCrs)
      
      # Create the prediction data 
      PredictionData <- diversity %>% 
        as.data.frame(na.rm = FALSE)%>%
        mutate(richness = vegan::specnumber(across(plantSpecies)),
               simpson = vegan::diversity(across(plantSpecies), "invsimpson"),
               shannon= vegan::diversity(across(plantSpecies)))%>%
        #mutate(averageIntensity = sum(across(plantSpecies), na.rm = TRUE)/length(plantSpecies))%>%
        reduce(cbind) %>% 
        bind_cols(geometries) %>% 
        st_sf()%>%
        st_transform(., newCrs)%>% # transform the prediction data to the units used in model runs
        filter(rowSums(is.na(.)) != (ncol(.)-1)) # Now we take all the rows with sum of NAs equal to the number of columns of the dataframe minus the 
      
      names(PredictionData) <- c(plantSpecies, "speciesNumber", "simpson", "shannon", "geometry")
      
      PredictionData <- PredictionData%>%
        mutate(averageIntensity = rowMeans(pick(all_of(plantSpecies))))%>%
        select(plantSpecies, "speciesNumber", "simpson", "shannon","averageIntensity","geometry")
      
      spPred[[j]] <- rasterize(PredictionData, predRast, names(PredictionData)[-length(names(PredictionData))])
      
    }
    
    names(spPred) <- focalTaxa
  } else if(zones == "all"){
    # Load the intensity of the species from the zone
    
    spPred <- lapply(seq_along(allZones), function(x){
      spPred <- list()
      for(j in 1:length(focalTaxa)){
        path <- paste(c(dataFolder, "modelOutputs", "pollinators", allZones[x], focalTaxa[j]), collapse = "/")
        results <- readRDS(file.path(path, paste0("predictions", ".rds")))
        results$intensity <- exp(results$mean)
        
        # Transform the intensity to the original scale
        diversity <-  lapply(plantSpecies, function(plantSp){
          divs <-  results$intensity * speciesData[[plantSp]] * interractionProb[interractionProb$species %in% plantSp, dat$Taxon[j]]
          #divs$overall <- app(divs, function(i) calculateRichness(i))
        })|>  
          rast() |>  # combine raster layers
          #scale() |>  # scale raster layers
          setNames(plantSpecies)
        
        
        # define geometries to combine with prediction 
        geometries <- xyFromCell(diversity, seq(ncell(diversity))) %>% 
          as.data.frame() %>% 
          st_as_sf(coords = c("x", "y"), crs = newCrs)
        
        # Create the prediction data 
        PredictionData <- diversity %>% 
          as.data.frame(na.rm = FALSE)%>%
          mutate(richness = vegan::specnumber(across(plantSpecies)),
                 simpson = vegan::diversity(across(plantSpecies), "invsimpson"),
                 shannon= vegan::diversity(across(plantSpecies)))%>%
          #mutate(averageIntensity = sum(across(plantSpecies), na.rm = TRUE)/length(plantSpecies))%>%
          reduce(cbind) %>% 
          bind_cols(geometries) %>% 
          st_sf()%>%
          st_transform(., newCrs)%>% # transform the prediction data to the units used in model runs
          filter(rowSums(is.na(.)) != (ncol(.)-1)) # Now we take all the rows with sum of NAs equal to the number of columns of the dataframe minus the 
        
        names(PredictionData) <- c(plantSpecies, "speciesNumber", "simpson", "shannon", "geometry")
        
        PredictionData <- PredictionData%>%
          mutate(averageIntensity = rowMeans(pick(all_of(plantSpecies))))%>%
          select(plantSpecies, "speciesNumber", "simpson", "shannon","averageIntensity","geometry")
        
        spPred[[j]] <- rasterize(PredictionData, predRast, names(PredictionData)[-length(names(PredictionData))])
        
      }  
      names(spPred) <- focalTaxa
      
      return(spPred)
    })
    names(spPred) <- allZones
    
  }
  
  
  
  # If we are interested in any particular pollinator species
  if(!returnAllPrediction ){
    return(spPred)
  } else if(!zones %in% "all"){
    # Find our how many species are in each group
    groupCounts <- dat %>%
      group_by(Taxon, zone)%>%
      summarise(groupCounts = n())
    
    # create a raster for the final results
    #allSpPred <- spPred$Hoverflies$Ajuga_pyramidalis
    # define geometries to combine with prediction 
    geometries <- crds(spPred$Hoverflies$Ajuga_pyramidalis)%>%
      as.data.frame()%>%
      st_as_sf(., coords = c("x", "y"), crs = newCrs)
    
    allSpPred <- lapply(focalTaxa, function(x){
      spPred[[x]]$averageIntensity * groupCounts[groupCounts$Taxon %in% x, 3]
    })%>%
      do.call("cbind", .)%>%
      rowSums()
    
    indSpeciesPred <- lapply(focalTaxa, function(x){
      ret <- spPred[[x]]$averageIntensity%>%
        as.data.frame(na.rm = TRUE)%>%
        dplyr::mutate(intensity = averageIntensity/allSpPred) %>%
        dplyr::select(intensity)%>%
        replicate((as.numeric(unlist(groupCounts[groupCounts$Taxon %in% x, 3]) )), ., simplify = FALSE)%>%
        do.call("cbind", .)
      
      names(ret) <- dat$acceptedScientificName[dat$Taxon == x]
      
      return(ret)
      
    })%>%
      do.call("cbind", .)%>%
      as.data.frame()%>%
      dplyr::mutate(richness = vegan::specnumber(across(dat$acceptedScientificName)),
                    simpson = vegan::diversity(across(dat$acceptedScientificName), "invsimpson"),
                    shannon= vegan::diversity(across(dat$acceptedScientificName)))%>%
      dplyr::select(richness, shannon, simpson)%>% 
      bind_cols(geometries) %>% 
      st_sf()%>%
      st_transform(., newCrs)%>% # transform the prediction data to the units used in model runs
      filter(rowSums(is.na(.)) != (ncol(.)-1))
    
    names(indSpeciesPred) <- c("richness", "shannon", "simpson","geometry")
    
    
    spPred <- rasterize(indSpeciesPred, predRast, c("richness", "shannon", "simpson"))
    return(spPred)
    
    
    
  } else if(zones %in% "all"){
    # Find our how many species are in each group
    groupCounts <- dat %>%
      group_by(Taxon, zone)%>%
      summarise(groupCounts = n())
    
    # create a raster for the final results
    #allSpPred <- spPred$Hoverflies$Ajuga_pyramidalis
    # define geometries to combine with prediction 
    geometries <- crds(spPred$alpine$Bees$Ajuga_pyramidalis)%>%
      as.data.frame()%>%
      st_as_sf(., coords = c("x", "y"), crs = newCrs)
    
    allSpPred <- list()
    for(i in seq_along(allZones)){
      allSpPred[[i]] <- lapply(focalTaxa, function(x){
        spPred[[allZones[i]]][[x]]$averageIntensity * groupCounts[groupCounts$Taxon %in% x, 3]
      })%>%
        do.call("cbind", .)%>%
        rowSums()
    }
    names(allSpPred) <- allZones
    
    indSpeciesPred <- list()
    for(i in seq_along(allZones)){
      indSpeciesPred[[i]] <- lapply(focalTaxa, function(x){
        ret <- spPred[[i]][[x]]$averageIntensity%>%
          as.data.frame(na.rm = TRUE)%>%
          dplyr::mutate(intensity = averageIntensity/allSpPred[[i]]) %>%
          dplyr::select(intensity)%>%
          replicate((as.numeric(unlist(groupCounts[(groupCounts$Taxon %in% x) & (groupCounts$zone %in% allZones[i]), 3]) )), ., simplify = FALSE)%>%
          do.call("cbind", .)
        
        names(ret) <- dat$acceptedScientificName[dat$Taxon == x & dat$zone == allZones[i]]
        
        return(ret)
        
        
      })
    }
    names(indSpeciesPred) <- allZones
    
    indSpeciesPred <- indSpeciesPred%>%
      purrr::flatten() %>%
      do.call("cbind", .)%>%
      as.data.frame()%>%
      dplyr::mutate(richness = vegan::specnumber(across(dat$acceptedScientificName)),
                    simpson = vegan::diversity(across(dat$acceptedScientificName), "invsimpson"),
                    shannon= vegan::diversity(across(dat$acceptedScientificName)))%>%
      dplyr::select(richness, shannon, simpson)%>% 
      bind_cols(geometries) %>% 
      st_sf()%>%
      st_transform(., newCrs)%>% # transform the prediction data to the units used in model runs
      filter(rowSums(is.na(.)) != (ncol(.)-1))
    
    names(indSpeciesPred) <- c("richness", "shannon", "simpson","geometry")
    
    
    spPred <- rasterize(indSpeciesPred, predRast, c("richness", "shannon", "simpson"))
    return(spPred)
  }
  
  
}  

# Load the functions needed to estimate the diversity
source("pipeline/richness/utils/estimateZonalDiversity.R")
source("pipeline/richness/utils/estimateTaxonDiversity.R")

# Getting diversity indices

if(!exists("biotraits")) source("pipeline/predictions/pollinatorZoneGroupings.R")
speciesForTaxon <- biotraits %>%
  dplyr::select(acceptedScientificName, Taxon, zone)%>%
  group_by(Taxon, zone)%>%
  distinct()%>%
  ungroup()%>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    ))%>%
  na.omit()

# Load all the plant predictions result
if(!exists("interractionMatrix")) source("pipeline/webForInterractions/webPlotsForModels.R")

asoDatasf <- readRDS(file = paste0(dataFolder, "/plantDataFolder/formattedData/asoData.RDS"))%>%
  mutate(
    simpleScientificName = coalesce(
      #redList$species[match(acceptedScientificName, redList$GBIFName)],  # Match redList species
      str_extract(acceptedScientificName, "^[A-Za-z]+\\s+[a-z]+")        # Extract binomial name
    )) %>%
  dplyr::filter(simpleScientificName %in% c(interractionMatrix$species)) #

plantSpecies <- unique(asoDatasf$simpleScientificName)

plantSpecies <- sapply(plantSpecies, function(x){
  rt <- stringr::str_split(x, " ")[[1]]
  taxon <- paste(rt[1], rt[2], sep = "_") 
  return(taxon)
}) %>%
  c()

plantSpecies <- as.character(plantSpecies)

# for(sp in species){
#   print(sp)
#   spPred <-   allPlantPredictions[[sp]]%>%
#     select("mean", "sd", "q0.025", "q0.5", "q0.975", "median", "Probabilities")#%>%
#     #mutate(Probabilities = myinvcloglog(mean))
#   #individualDatasetPreds[[which(species %in% sp)]] <- spPred
#   spPred <- rasterize(spPred, predRast, c("mean", "sd", "q0.025", "q0.5", "q0.975", "median", "Probabilities"))
#   path <- paste0(dataFolder, "/modelOutputs/plants/", sp)
#   
#   if(!file.exists(path)){
#     dir.create(path)
#   }
#   saveRDS(spPred, file.path(path, paste0("predictions.rds")))
# }

##################################################
# Plant Species results
#########################################

speciesData <- lapply(paste0(dataFolder, "/modelOutputs/plants/", plantSpecies), function(x){
 # try(list.files(x, pattern = paste0("predictions.rds"), recursive = TRUE, full.names = TRUE))
  preds <- readRDS(list.files(x, pattern = paste0("predictions.rds"), recursive = TRUE, full.names = TRUE))
  return(preds$Probabilities)
})|>  
  rast() |>  # combine raster layers
  #scale() |>  # scale raster layers
  setNames(plantSpecies)

#######################################################
# PlantData
#######################################################
#pollinatorSpeciesNames <- unique(speciesForTaxon$simpleScientificName)

# populate the species distributions


###############################################
# Diversity for alpine regions
##############################################
if(rerun){
# pollinatorSpeciesNames <- speciesForTaxon%>%
#   dplyr::filter(zone == "alpine")%>%
#   na.omit()%>%
#   dplyr::select(simpleScientificName)%>%
#   unique()%>%
#   c()
  
######################################
# ZONAL DIVERSITY ESTIMATES
#####################################
zone <- c("alpine", "boreal", "temperate")

for(sp in zone){
spPred <- estimateZonalDiversity(pollinatorSpeciesNames = NULL,
                  speciesForTaxon =  speciesForTaxon,
                  zones = sp,
                  predRast = predRast,
                  silent = FALSE)   

path <- paste0(dataFolder, "/modelOutputs/diversity/", sp)
spPred <- rasterize(spPred, predRast, names(spPred))
if(!file.exists(path)){
  dir.create(path)
}
saveRDS(spPred, file.path(path, paste0("predictions.rds")))
}  


# Make predictions for all the zones
sp <- "all"
spPred <- estimateZonalDiversity(pollinatorSpeciesNames = NULL,
                            speciesForTaxon =  speciesForTaxon,
                            zones = sp,
                            predRast = predRast,
                            silent = FALSE)   

path <- paste0(dataFolder, "/modelOutputs/diversity/", sp)
spPred <- rasterize(spPred, predRast, names(spPred))
if(!file.exists(path)){
  dir.create(path)
}
saveRDS(spPred, file.path(path, paste0("predictions.rds")))
rm("spPred")

######################################
# POLLINATOR TAXON DIVERSITY ESTIMATES
#####################################
taxa <- c("Butterflies", "Bees", "Hoverflies")

for(sp in taxa){
  spPred <- estimateTaxonDiversity(pollinatorSpeciesNames = NULL,
                                   speciesForTaxon =  speciesForTaxon,
                                   taxa = sp,
                                   predRast = predRast,
                                   silent = FALSE)   
  
  path <- paste0(dataFolder, "/modelOutputs/diversity/", sp)
  spPred <- rasterize(spPred, predRast, names(spPred))
  if(!file.exists(path)){
    dir.create(path)
  }
  saveRDS(spPred, file.path(path, paste0("predictions.rds")))
  rm("spPred")
}  


# Make predictions for all the zones
sp <- "allTaxa"
spPred <- estimateTaxonDiversity(pollinatorSpeciesNames = NULL,
                                 speciesForTaxon =  speciesForTaxon,
                                 taxa = sp,
                                 predRast = predRast,
                                 silent = FALSE)   

path <- paste0(dataFolder, "/modelOutputs/diversity/", sp)
#spPred <- rasterize(spPred, predRast, names(spPred))
if(!file.exists(path)){
  dir.create(path)
}
saveRDS(spPred, file.path(path, paste0("predictions.rds")))

} else {
  # load(paste0(dataFolder, "/modelOutputs/diversity/alpineDiversity.RData"))
  # load(paste0(dataFolder, "/modelOutputs/diversity/borealDiversity.RData"))
  
  alpine <- readRDS("Data/modelOutputs/diversity/alpine/predictions.rds")
  boreal <- readRDS("Data/modelOutputs/diversity/boreal/predictions.rds")
  temperate <- readRDS("Data/modelOutputs/diversity/temperate/predictions.rds")
  
  bees <- readRDS("Data/modelOutputs/diversity/Bees/predictions.rds")
  butterflies <- readRDS("Data/modelOutputs/diversity/Butterflies/predictions.rds")
  hoverflies <- readRDS("Data/modelOutputs/diversity/Hoverflies/predictions.rds")
}

10. Results

10.1. Distribution of pollinators accross Norway

10.1.1. ISDM parameter estimates

We present the estimates of the climatic and trait variable covariates in Figure 16 and Figure 17. The pollinator abundance increases with trait latitude (mLat) and temperature (mBio1) as well as annual temperature (polyBio1 and polyBio2). However, the abundance decreases with an increasing latitude (Latitude). Keeping all other covariates contants, the results shows that we expect butterflies to be more abundant, followed by hoverflies and bees across Norway (see the intercept in Figure 16).

Figure 16: Posterior mean of traits and climatic covariate effect on the distribution of pollinators (bees, butterflies and hoverflies) in Norway. The length of the error line represents the 95% credible intervals.

The pollinators favor built-up areas with reference to agro-forestry (Figure 17. Conversely, the intensity of the pollinators are lower at the other land cover categories with reference to the agro-foresty regions.

Figure 17: Posterior mean of land cover variables on the distribution of pollinators (bees, butterflies and hoverflies) with reference to the agro-forestry category in Norway. The length of the error line represents the 95% credible intervals.

10.1.2. Validation of trait effects

The model selection shows that the ISDM fitted using the original traits is better than those with randomised trait covariates Table 11. This is because the Deviance Information criterion (DIC) and Watanabe-Akaike information criterion (waic) were smaller in the ISDM with original traits.

Table 11: Model selection metrics from the model fitted with the original (Figure 11) and randomised (Figure 12) traits. Three metrics are available: log marginal likelood (mlik), Watanabe-Akaike information criterion (waic) and deviance information criterion (dic). We select the model with lower metrics (where the better column shows if the metric for the ISDM with original traits is smaller than the ISDM with the randomised traits.
Metric Original traits Randomised traits Original traits Better
mlik -1.107112e+06 -1.109778e+06 FALSE
waic 3.881022e+05 4.245882e+05 TRUE
dic -2.155645e+00 -1.857201e+00 TRUE

10.1.3. Spatial bias

We present the posterior mean of the bias field added to the presence-only data observation model (see \(\eta(s)\) described in Section 1.9.1) from the pollinator ISDM in Figure 18. This bias field describes the sampling, detection, reporting and other biases that are present in the presence-only data. We see a smooth spatial field as we have a more even spread of the pollinators accross the study region (see ?@fig-pollinatorPOandNSOS).

Figure 18: Posterior mean of spatial bias field from the pollinator ISDM.

10.1.4. Prediction of pollinator distribution

We present the predictions of log-intensity of the pollinators in Figure 20. We see that the butterfliea are more abundant at Northern Norway, with the hoverflies and bees abundant at the southern region. This seems surprising as the abundance of all the pollinators decreases as the latitude increases and annual temperature increases Figure 16. Future iterations of this work will explore using a more finer mesh to fit the model and make thse predictions to ensure more stable and interpretable results.

Figure 19: Log intensity of pollinators (bees, butterfliesand hoverflies) across the study region from the fitted integrated species distribution model.

Finally, we present the log-intensity of the alpine, boreal and temperate pollinators across Norway in Figure 20. On this scale, we do not see a clear difference between the various zones. The effect of the mBio1 and mLat are not large enough to cause a significant changes in the predicted zones (although on the original scale, we see some changes but not too significant).

Figure 20: Log intensity of pollinators (bees, butterflies and hoverflies) across the study region from the fitted integrated species distribution model.

10.2. Distribution of plant hosts

10.2.1 ISDM parameter estimates

We present the estimates of the climatic and trait variable covariates in Figure 21 and Figure 22. Generally, the plant host occurrence are higher at regions with hgh soil PH and annual mean temperature.

Figure 21: Posterior mean of traits and climatic covariate effect on the distribution of the plant hosts in Norway. The length of the error line represents the 95% credible intervals.

The pollinators favor built-up areas with reference to agro-forestry and other forest (Figure 22). Conversely, the intensity of the pollinators are lower at the other land cover categories with reference to the agro-foresty regions.

Figure 22: Posterior mean of land cover variables on the distribution of plant hosts with reference to the agro-forestry category in Norway. The length of the error line represents the 95% credible intervals.

10.2.2. Spatial bias

We present the posterior mean of the bias field added to the presence-only data observation model (see \(\eta(s)\) described in Section 1.9.1) from the plant hosts ISDM in Figure 23. This bias field describes the sampling and/or reporting biases that are present in the presence-only data. We see that regions with presence-only records have higher intensity than regions with low presence occurrence records(see Figure 9).

Figure 23: Posterior mean of spatial bias field from the pollinator ISDM.

10.2.3 Plant host occurrence probability

We present the posterior mean of the plant host occurrence probability across Norway in Figure 24. Some species (such as Veronica officinalis and Filipendula ulmaria) have relatively higher occurrence probabilities across the region whilst others (such as Hieracium murorum) have relatively smaller occurrence probabilities.

Figure 24: Posterior mean of plant host occurrence probability across the study region.

10.3. Alpha diversity estimates

10.3.1 Diversity at ASO meadows

Although we estimated species richness and Shannon inices, We present the inverse Simpson indices for the alpine Figure 25, boreal Figure 26 and temperate Figure 27 pollinators at the ASO meadows. We group these estimates based on the value of the annual mean temperature (bio1) and latitude (see Section 1.4.2.1 and Section 1.9.5 for details).

Figure 25: Simpson diversity estimates for the alpine pollinators within the ASO data meadows. The diversity indices are colored based on their geoclimatic regions (clusters based on the annual mean temperature and latitude).

It can observed that there are no ASO meadow locations within negative annual temperature and positive latitude zones (Figure 25, Figure 26, Figure 27). Regions with high temperature (positive regions) are more diverse than regions with low temperature. Additionally, meadows at lower latitudes with colder temperatures are the least diverse.

Figure 26: Inverse Simpson diversity estimates for the boreal pollinators within the ASO data meadows. The diversity indices are colored based on their geoclimatic regions (clusters based on the annual mean temperature and latitude).

It is obvious that diversity increases with the species richness and eveness. The species richness across all meadows for each pollinator group (alpine, boreal and temperate pollinators) were all the same, and so the diversity within a meadow reflects how evenly species are distributed within the meadow.

Figure 27: Inverse Simpson diversity estimates for the temperate pollinators within the ASO data meadows. The diversity indices are colored based on their geoclimatic regions (clusters based on the annual mean temperature and latitude).

We also present the Shannon-Weiner and inverse Simpson diversity indices for the alpine, boreal and temperate pollinators in Figure 28. Both indices tell the same story: there are more diverse temperate pollinators, than boreal pollinators and alpine pollinators. When we put all the zones together, we see that the diversity is higher at regions with low altitude and high latitude (last column in Figure 28 and compare with column called annual temperature in Figure 11).

10.3.2. Diversity across Norway

Figure 28: The Shannon Weiner and inverse Simpson diversity indices for alpine, boreal, temperate and all zones estimated across Norway.

Finally, the diversity of bees, butterflies and hoverflies across the study region are presented in the last row in Figure 28. We see that bees and butterflies have similar diversity distribution, with hoverflies being more diverse than both of them. However, all the pollinators are more diverse at regions with low latitude and high mean temperatures (a result consistent within this chapter).

10.4. Pollinator indicators

The reader is referred to Section 1.4 for the results.

11. Export file

12. References

Adjei, Peprah Kwaku, Philip Mostert, Jorge Sicacha Parada, Emma Skarstein, and Robert B O’Hara. 2023. “The Point Process Framework for Integrated Modelling of Biodiversity Data.” arXiv e-Prints, arXiv–2311.
Bachl, Fabian E., Finn Lindgren, David L. Borchers, and Janine B. Illian. 2019. inlabru: An R Package for Bayesian Spatial Modelling from Ecological Survey Data.” Methods in Ecology and Evolution 10: 760–66. https://doi.org/10.1111/2041-210X.13168.
Blangiardo, Marta, and Michela Cameletti. 2015. Spatial and Spatio-Temporal Bayesian Models with r-INLA. John Wiley & Sons.
Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2024. Rgbif: Interface to the Global Biodiversity Information Facility API. https://CRAN.R-project.org/package=rgbif.
Dorazio, Robert M. 2014. “Accounting for Imperfect Detection and Survey Bias in Statistical Analysis of Presence-Only Data.” Global Ecology and Biogeography 23 (12): 1472–84. https://doi.org/10.1111/geb.12216.
Dormann, Carsten F., Bernd Gruber, and Jochen Fruend. 2008. “Introducing the Bipartite Package: Analysing Ecological Networks.” R News 8 (2): 8–11.
Fithian, William, Jane Elith, Trevor Hastie, and David A. Keith. 2015. “Bias Correction in Species Distribution Models: Pooling Survey and Collection Data for Multiple Species.” Methods in Ecology and Evolution 6 (4): 424–38. https://doi.org/10.1111/2041-210X.12242.
Framstad, Erik, Anders L Kolstad, Signe Nybø, Joachim Töpper, and Vigdis Vandvik. 2022. “The Condition of Forest and Mountain Ecosystems in Norway. Assessment by the IBECA Method.”
GBIF.Org User. 2024a. “Occurrence Download.” The Global Biodiversity Information Facility. https://doi.org/10.15468/DL.ZSEX2M.
———. 2024b. “Occurrence Download.” The Global Biodiversity Information Facility. https://doi.org/10.15468/DL.AWRNR2.
Gómez-Rubio, Virgilio. 2020. Bayesian Inference with INLA. Chapman; Hall/CRC.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. “An Introduction to Statistical Learning.”
Hickler, Thomas, Katrin Vohland, Jane Feehan, Paul A Miller, Benjamin Smith, Luis Costa, Thomas Giesecke, et al. 2012. “Projecting the Future Distribution of European Potential Natural Vegetation Zones with a Generalized, Tree Species-Based Dynamic Vegetation Model.” Global Ecology and Biogeography 21 (1): 50–63.
Hill, Mark O. 1973. “Diversity and Evenness: A Unifying Notation and Its Consequences.” Ecology 54 (2): 427–32.
Illian, Janine, Antti Penttinen, Helga Stoyan, and Dietrich Stoyan. 2008. Statistical Analysis and Modelling of Spatial Point Patterns. John Wiley & Sons.
Isaac, Nick JB, Marta A Jarzyna, Petr Keil, Lea I Dambly, Philipp H Boersch-Supan, Ella Browning, Stephen N Freeman, et al. 2020. “Data Integration for Large-Scale Models of Species Distributions.” Trends in Ecology & Evolution 35 (1): 56–67. https://doi.org/10.1016/j.tree.2019.08.006.
Jost, Lou. 2006. “Entropy and Diversity.” Oikos 113 (2): 363–75.
———. 2007. “Partitioning Diversity into Independent Alpha and Beta Components.” Ecology 88 (10): 2427–39.
Keith, Heather, Bálint Czúcz, Bethanna Jackson, Amanda Driver, Emily Nicholson, and Joachim Maes. 2020. “A Conceptual Framework and Practical Structure for Implementing Ecosystem Condition Accounts.”
Mostert, Philip S, and Robert B O’Hara. 2023. “PointedSDMs: An r Package to Help Facilitate the Construction of Integrated Species Distribution Models.” Methods in Ecology and Evolution 14 (5): 1200–1207.
Museum, Natural History. 2023a. “Data Portal Query on "HOSTS" Created at 2023-06-28 10:30:49.598237 PID.” https://doi.org/https://doi.org/10.5519/qd.izm3kg02.
———. 2023b. “Subset of "HOSTS - a Database of the World’s Lepidopteran Hostplants" (Dataset) PID.” https://doi.org/https://doi.org/10.5519/havt50xw.
Oksanen, Jari, Gavin L. Simpson, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, et al. 2022. Vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan.
Poggio, Laura, Luis M De Sousa, Niels H Batjes, Gerard BM Heuvelink, Bas Kempen, Eloi Ribeiro, and David Rossiter. 2021. “SoilGrids 2.0: Producing Soil Information for the Globe with Quantified Spatial Uncertainty.” Soil 7 (1): 217–40.
Rahayu, Agavia Kori, Risti Endriani Arhatin, Jordan Gacutan, Firdaus Agung, Jessica Pingkan, Annisya Rosdiana, and Irfan Yulianto. 2024. “Optimising Marine Basic Spatial Units (MBSU) for Ocean Accounting Using Empirical Data from Saleh Bay, Indonesia.” One Ecosystem 9: e125578.
Rasmussen, Claus, Yoko L Dupont, Henning Bang Madsen, Petr Bogusch, Dave Goulson, Lina Herbertsson, Kate Pereira Maia, et al. 2021. “Evaluating Competition for Forage Plants Between Honey Bees and Wild Bees in Denmark.” Plos One 16 (4): e0250056.
Robinson, Gaden S., Phillip R. Ackery, Ian Kitching, George W Beccaloni, and Luis M. Hernández. 2023. “HOSTS - a Database of the World’s Lepidopteran Hostplants.” Natural History Museum. https://doi.org/10.5519/HAVT50XW.
Rue, Håvard, Sara Martino, and Nicholas Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations (with Discussion).” Journal of the Royal Statistical Society B 71: 319–92.
Thukral, Ashwani K. 2017. “A Review on Measurement of Alpha Diversity in Biology.” Agricultural Research Journal 54 (1).
Zhou, Jizhong, Beicheng Xia, David S Treves, L-Y Wu, Terry L Marsh, Robert V O’Neill, Anthony V Palumbo, and James M Tiedje. 2002. “Spatial and Resource Factors Influencing High Microbial Diversity in Soil.” Applied and Environmental Microbiology 68 (1): 326–34.
Zuur, AF. 2017. “Beginner’s Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with r-Inla: Using Glm and Glmm Volume i.” Hightland Statistics Ltd., Sl OCLC 973745327: 61.